Implementation of momentum^2 teacher

Last update: Sep 26, 2022

Related tags

Deep Learning momentum2-teacher

Overview

Momentum^{^2} Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Requirements

All experiments are done with python3.6, torch==1.5.0; torchvision==0.6.0

Usage

Data Preparation

Prepare the ImageNet data in ${root_of_your_clone}/data/imagenet_train, ${root_of_your_clone}/data/imagenet_val. Since we have an internal platform(storage) to read imagenet, I have not tried the local mode. You may need to do some modification in momentum_teacher/data/dataset.py to support the local mode.

Training

Before training, ensure the path (namely ${root_of_clone}) is added in your PYTHONPATH, e.g.

export PYTHONPATH=$PYTHONPATH:${root_of_clone}

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

using -d to specify gpu_id for training, e.g., -d 0-7
using -b to specify batch_size, e.g., -b 256
using --experiment-name to specify the output folder, and the training log & models will be dumped to './outputs/${experiment-name}'
using -f to specify the description file of ur experiment.

e.g.,

python3 momentum_teacher/tools/train.py -b 256 -d 0-7 --experiment-name your_exp -f momentum_teacher/exps/arxiv/exp_8_v100/momentum2_teacher_100e_exp.py

Linear Evaluation:

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8 gpus machine, run:

using -d to specify gpu_id for training, e.g., -d 0-7
using -b to specify batch_size, e.g., -b 256
using --experiment-name to specify the folder for saving pre-training models.

python3 momentum_teacher/tools/eval.py -b 256 --experiment-name your_exp -f momentum_teacher/exps/arxiv/linear_eval_exp_byol.py

Results

Results of Pretraining on a Single Machine

After pretraining on 8 NVIDIA V100 GPUS and 1024 batch-sizes, the results of linear-evaluation are:

pre-train code	pre-train epochs	pre-train time	accuracy	weights
path	100	~1.8 day	70.7	-
path	200	~3.6 day	72.7	-
path	300	~5.5 day	73.8	-

After pretraining on 8 NVIDIA 2080 GPUS and 256 batch-sizes, the results of linear-evaluation are:

pre-train code	pre-train epochs	pre-train time	accuracy	wights
path	100	~2.5 day	70.4	-
path	200	~5 day	72.3	-
path	300	~7.5 day	72.9	-

Results of Pretraining on Multiple Machines

E.g., To do unsupervised pre-training with 4096 batch-sizes and 32 V100 GPUs. run:

Suggesting that each machine has 8 V100 GPUs and there are 4 machines

# machine 1:
export MACHINE=0; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 2:
export MACHINE=1; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 3:
export MACHINE=2; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 4:
export MACHINE=3; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx

results of linear-eval:

pre-train code	pre-train epochs	pre-train time	accuracy	weights
path	100	~11hour	70.3	-
path	200	~22hour	72.5	-
path	300	~33hour	73.7	-

To do unsupervised pre-training with 4096 batch-sizes and 128 2080 GPUs, pls follow the above guides. Results of linear-eval:

pre-train code	pre-train epochs	pre-train time	accuracy	weights
path	100	~5hour	69.0	-
path	200	~10hour	71.5	-
path	300	~15hour	72.3	-

Disclaimer

This is an implementation for Momentum^2 Teacher, it is worth noting that:

The original implementation is based on our internal Platform.
This released version has slightly better performances compared with the tech report's.

Implementation of momentum^2 teacher

Related tags

Overview

Momentum^{^2} Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Requirements

Usage

Data Preparation

Training

Linear Evaluation:

Results

Results of Pretraining on a Single Machine

Results of Pretraining on Multiple Machines

Disclaimer

Owner

jemmy li

Code and training data for our ECCV 2016 paper on Unsupervised Learning

DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

Research using Cirq!

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

StyleGAN of All Trades: Image Manipulation withOnly Pretrained StyleGAN

Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

List of all dependencies affected by node-ipc malicious commit

Roadmap to becoming a machine learning engineer in 2020

Code for paper "ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation"

FNet Implementation with TensorFlow & PyTorch

functorch is a prototype of JAX-like composable function transforms for PyTorch.

codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

TensorFlow CNN for fast style transfer

Shared Attention for Multi-label Zero-shot Learning

Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

Streamlit App For Product Analysis - Streamlit App For Product Analysis

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

A map update dataset and benchmark

Implementation of momentum^2 teacher

Related tags

Overview

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Requirements

Usage

Data Preparation

Training

Linear Evaluation:

Results

Results of Pretraining on a Single Machine

Results of Pretraining on Multiple Machines

Disclaimer

Owner

jemmy li

Code and training data for our ECCV 2016 paper on Unsupervised Learning

DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

Research using Cirq!

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

StyleGAN of All Trades: Image Manipulation withOnly Pretrained StyleGAN

Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

List of all dependencies affected by node-ipc malicious commit

Roadmap to becoming a machine learning engineer in 2020

Code for paper "ASAP-Net: Attention and Structure Aware Point Cloud Sequence Segmentation"

FNet Implementation with TensorFlow & PyTorch

functorch is a prototype of JAX-like composable function transforms for PyTorch.

codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

TensorFlow CNN for fast style transfer

Shared Attention for Multi-label Zero-shot Learning

Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

Streamlit App For Product Analysis - Streamlit App For Product Analysis

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

A map update dataset and benchmark

Momentum^{^2} Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning