Implementations for the ICLR-2021 paper: SEED: Self-supervised Distillation For Visual Representation.

Related tags

Deep LearningSEED
Overview

SEED

Implementations for the ICLR-2021 paper: SEED: Self-supervised Distillation For Visual Representation.

@Article{fang2020seed,
  author  = {Fang, Zhiyuan and Wang, Jianfeng and Wang, Lijuan and Zhang, Lei and Yang, Yezhou and Liu, Zicheng},
  title   = {SEED: Self-supervised Distillation For Visual Representation},
  journal = {International Conference on Learning Representations},
  year    = {2021},
}

Introduction

This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that SEED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, SEED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNetV3-Large on the ImageNet-1k dataset. SEED improves the ResNet-50 from 67.4% to 74.3% from the previous MoCo-V2 baseline. image

Preperation

Note: This repository does not contain the ImageNet dataset building, please refer to MoCo-V2 for the enviromental setting & dataset preparation. Be careful if you use FaceBook's ImageNet dataset implementation as the provided dataloader here is to handle TSV ImageNet source.

Self-Supervised Distillation Training

SWAV's 400_ep ResNet-50 model as Teacher architecture for a Student EfficientNet-b1 model with multi-view strategies. Place the pre-trained checkpoint in ./output directory. Remember to change the parameter name in the checkpoint as some module provided by SimCLR, MoCo-V2 and SWAV are inconsistent with regular PyTorch implementations. Here we provide the pre-trained SWAV/MoCo-V2/SimCLR Pre-trained checkpoints, but all credits belong to them.

Teacher Arch. SSL Method Teacher SSL-epochs Link
ResNet-50 MoCo-V1 200 URL
ResNet-50 SimCLR 200 URL
ResNet-50 MoCo-V2 200 URL
ResNet-50 MoCo-V2 800 URL
ResNet-50 SWAV 800 URL
ResNet-101 MoCo-V2 200 URL
ResNet-152 MoCo-V2 200 URL
ResNet-152 MoCo-V2 800 URL
ResNet-50X2 SWAV 400 URL
ResNet-50X4 SWAV 400 URL
ResNet-50X5 SWAV 400 URL

To conduct the training one GPU on single Node using Distributed Training:

python -m torch.distributed.launch --nproc_per_node=1 main_small-patch.py \
       -a efficientnet_b1 \
       -k resnet50 \
       --teacher_ssl swav \
       --distill ./output/swav_400ep_pretrain.pth.tar \
       --lr 0.03 \
       --batch-size 16 \
       --temp 0.2 \
       --workers 4 
       --output ./output \
       --data [your TSV imagenet-folder with train folders]

Conduct linear evaluations on ImageNet-val split:

python -m torch.distributed.launch --nproc_per_node=1  main_lincls.py \
       -a efficientnet_b0 \
       --lr 30 \
       --batch-size 32 \
       --output ./output \ 
       [your TSV imagenet-folder with val folders]

Checkpoints by SEED

Here we provide some pre-trained checkpoints after distillation by SEED. Note: the 800 epcohs one are trained with small-view strategies and have better performances.

Student-Arch. Teacher-Arch. Teacher SSL Student SEED-epochs Link
ResNet-18 ResNet-50 MoCo-V2 200 URL
ResNet-18 ResNet-50W2 SWAV 400 URL
MobileV3-Large ResNet-50 MoCo-V2 200 URL
EfficientNet-B0 ResNet-50W4 SWAV 400 URL
EfficientNet-B0 ResNet-50W2 SWAV 800 URL
EfficientNet-B1 ResNet-50 SWAV 200 URL
EfficientNet-B1 ResNet-152 SWAV 200 URL
ResNet-50 ResNet-50W4 SWAV 400 URL

Glance of the Performances

ImageNet-1k test accuracy (%) using KNN and linear classification for multiple students and MoCov2 pre-trained deeper teacher architectures. ✗ denotes MoCo-V2 self-supervised learning baselines before distillation. * indicates using a deeper teacher encoder pre-trained by SWAV, where additional small-patches are also utilized during distillation and trained for 800 epochs. K denotes Top-1 accuracy using KNN. T-1 and T-5 denote Top-1 and Top-5 accuracy using linear evaluation. First column shows Top-1 Acc. of Teacher network. First row shows the supervised performances of student networks.

Acknowledge

This implementation is largely originated from: MoCo-V2. Thanks SWAV and SimCLR for the pre-trained SSL checkpoints.

This work is done jointly with ASU-APG lab and Microsoft Azure-Florence Group. Thanks my collaborators.

License

SEED is released under the MIT license.

Owner
Jacob
Jacob
ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Lightweight, efficient and stable implementations of deep reinforcement learning algorithms using PyTorch. 🔥

AI4Finance 2.5k Jan 08, 2023
Bounding Wasserstein distance with couplings

BoundWasserstein These scripts reproduce the results of the article Bounding Wasserstein distance with couplings by Niloy Biswas and Lester Mackey. ar

Niloy Biswas 1 Jan 11, 2022
Neural Logic Inductive Learning

Neural Logic Inductive Learning This is the implementation of the Neural Logic Inductive Learning model (NLIL) proposed in the ICLR 2020 paper: Learn

36 Nov 28, 2022
Code for "Causal autoregressive flows" - AISTATS, 2021

Code for "Causal Autoregressive Flow" This repository contains code to run and reproduce experiments presented in Causal Autoregressive Flows, present

Ricardo Pio Monti 35 Dec 16, 2022
A Jinja extension (compatible with Flask and other frameworks) to compile and/or compress your assets.

A Jinja extension (compatible with Flask and other frameworks) to compile and/or compress your assets.

Jayson Reis 94 Nov 21, 2022
なりすまし検出(anti-spoof-mn3)のWebカメラ向けデモ

FaceDetection-Anti-Spoof-Demo なりすまし検出(anti-spoof-mn3)のWebカメラ向けデモです。 モデルはPINTO_model_zoo/191_anti-spoof-mn3からONNX形式のモデルを使用しています。 Requirement mediapipe

KazuhitoTakahashi 8 Nov 18, 2022
GLANet - The code for Global and Local Alignment Networks for Unpaired Image-to-Image Translation arxiv

GLANet The code for Global and Local Alignment Networks for Unpaired Image-to-Image Translation arxiv Framework: visualization results: Getting Starte

stanley 29 Dec 14, 2022
Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic S

Ken Lin 17 Oct 12, 2022
TianyuQi 10 Dec 11, 2022
My 1st place solution at Kaggle Hotel-ID 2021

1st place solution at Kaggle Hotel-ID My 1st place solution at Kaggle Hotel-ID to Combat Human Trafficking 2021. https://www.kaggle.com/c/hotel-id-202

Kohei Ozaki 18 Aug 19, 2022
Learning Logic Rules for Document-Level Relation Extraction

LogiRE Learning Logic Rules for Document-Level Relation Extraction We propose to introduce logic rules to tackle the challenges of doc-level RE. Equip

41 Dec 26, 2022
CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax ⚠️ Latest: Current repo is a complete version. But we delet

FishYuLi 341 Dec 23, 2022
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs Directory Structur

Peter Hase 19 Aug 21, 2022
Imaging, analysis, and simulation software for radio interferometry

ehtim (eht-imaging) Python modules for simulating and manipulating VLBI data and producing images with regularized maximum likelihood methods. This ve

Andrew Chael 5.2k Dec 28, 2022
Multi-Person Extreme Motion Prediction

Multi-Person Extreme Motion Prediction Implementation for paper Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer, Multi-Person Extre

GUO-W 38 Nov 15, 2022
Wikidated : An Evolving Knowledge Graph Dataset of Wikidata’s Revision History

Wikidated Wikidated 1.0 is a dataset of Wikidata’s full revision history, which encodes changes between Wikidata revisions as sets of deletions and ad

Lukas Schmelzeisen 11 Aug 16, 2022
[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

SapBERT: Self-alignment pretraining for BERT This repo holds code for the SapBERT model presented in our NAACL 2021 paper: Self-Alignment Pretraining

Cambridge Language Technology Lab 104 Dec 07, 2022
MARE - Multi-Attribute Relation Extraction

MARE - Multi-Attribute Relation Extraction Repository for the paper submission: #TODO: insert link, when available Environment Tested with Ubuntu 18.0

0 May 11, 2021
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR)

Ilya Kostrikov 3k Dec 31, 2022
Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation Prerequisites This repo is built upon a local copy of transfo

Jixuan Wang 10 Sep 28, 2022