Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Last update: Dec 23, 2022

Overview

Attention Transfer

PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer" https://arxiv.org/abs/1612.03928
Conference paper at ICLR2017: https://openreview.net/forum?id=Sks9_ajex

What's in this repo so far:

Activation-based AT code for CIFAR-10 experiments
Code for ImageNet experiments (ResNet-18-ResNet-34 student-teacher)
Jupyter notebook to visualize attention maps of ResNet-34 visualize-attention.ipynb

Coming:

grad-based AT
Scenes and CUB activation-based AT code

The code uses PyTorch https://pytorch.org. Note that the original experiments were done using torch-autograd, we have so far validated that CIFAR-10 experiments are exactly reproducible in PyTorch, and are in process of doing so for ImageNet (results are very slightly worse in PyTorch, due to hyperparameters).

bibtex:

@inproceedings{Zagoruyko2017AT,
    author = {Sergey Zagoruyko and Nikos Komodakis},
    title = {Paying More Attention to Attention: Improving the Performance of
             Convolutional Neural Networks via Attention Transfer},
    booktitle = {ICLR},
    url = {https://arxiv.org/abs/1612.03928},
    year = {2017}}

Requirements

First install PyTorch, then install torchnet:

pip install git+https://github.com/pytorch/[email protected]

then install other Python packages:

pip install -r requirements.txt

Experiments

CIFAR-10

This section describes how to get the results in the table 1 of the paper.

First, train teachers:

python cifar.py --save logs/resnet_40_1_teacher --depth 40 --width 1
python cifar.py --save logs/resnet_16_2_teacher --depth 16 --width 2
python cifar.py --save logs/resnet_40_2_teacher --depth 40 --width 2

To train with activation-based AT do:

python cifar.py --save logs/at_16_1_16_2 --teacher_id resnet_16_2_teacher --beta 1e+3

To train with KD:

python cifar.py --save logs/kd_16_1_16_2 --teacher_id resnet_16_2_teacher --alpha 0.9

We plan to add AT+KD with decaying beta to get the best knowledge transfer results soon.

ImageNet

Pretrained model

We provide ResNet-18 pretrained model with activation based AT:

Model	val error
ResNet-18	30.4, 10.8
ResNet-18-ResNet-34-AT	29.3, 10.0

Download link: https://s3.amazonaws.com/modelzoo-networks/resnet-18-at-export.pth

Model definition: https://github.com/szagoruyko/functional-zoo/blob/master/resnet-18-at-export.ipynb

Convergence plot:

Train from scratch

Download pretrained weights for ResNet-34 (see also functional-zoo for more information):

wget https://s3.amazonaws.com/modelzoo-networks/resnet-34-export.pth

Prepare the data following fb.resnet.torch and run training (e.g. using 2 GPUs):

python imagenet.py --imagenetpath ~/ILSVRC2012 --depth 18 --width 1 \
                   --teacher_params resnet-34-export.hkl --gpu_id 0,1 --ngpu 2 \
                   --beta 1e+3

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Related tags

Overview

Attention Transfer

Requirements

Experiments

CIFAR-10

ImageNet

Pretrained model

Train from scratch

Owner

Sergey Zagoruyko

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

《Rethinking Sptil Dimensions of Vision Trnsformers》(2021)

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

GRaNDPapA: Generator of Rad Names from Decent Paper Acronyms

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

TipToiDog - Tip Toi Dog With Python

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

This is the paddle code for SeBoW(Self-Born wiring for neural trees), a kind of neural tree born form a large search space

CaLiGraph Ontology as a Challenge for Semantic Reasoners ([email protected]'21)

'Solving the sampling problem of the Sycamore quantum supremacy circuits

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

Library for machine learning stacking generalization.

Official implementation of NeurIPS 2021 paper "Contextual Similarity Aggregation with Self-attention for Visual Re-ranking"

Image inpainting using Gaussian Mixture Models

Tools for manipulating UVs in the Blender viewport.

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

The Balloon Learning Environment - flying stratospheric balloons with deep reinforcement learning.

deep learning model that learns to code with drawing in the Processing language