SOTA model in CIFAR10

Last update: Dec 21, 2022

Related tags

Deep Learning pytorch-cifar-tricks

Overview

A PyTorch Implementation of CIFAR Tricks

调研了CIFAR10数据集上各种trick，数据增强，正则化方法，并进行了实现。目前项目告一段落，如果有更好的想法，或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。

0. Requirements

Python 3.6+
torch=1.8.0+cu111
torchvision+0.9.0+cu111
tqdm=4.26.0
PyYAML=6.0

1. Implements

1.1 Tricks

Warmup
Cosine LR Decay
SAM
Label Smooth
KD
Adabound
Xavier Kaiming init
lr finder

1.2 Augmentation

Auto Augmentation
Cutout
Mixup
RICAP
Random Erase
ShakeDrop

2. Training

2.1 CIFAR-10训练示例

WideResNet28-10 baseline on CIFAR-10:

python train.py --dataset cifar10

WideResNet28-10 +RICAP on CIFAR-10:

python train.py --dataset cifar10 --ricap True

WideResNet28-10 +Random Erasing on CIFAR-10:

python train.py --dataset cifar10 --random-erase True

WideResNet28-10 +Mixup on CIFAR-10:

python train.py --dataset cifar10 --mixup True

3. Results

3.1 原pytorch-ricap的结果

Model	Error rate	Loss	Error rate (paper)
WideResNet28-10 baseline	3.82（96.18）	0.158	3.89
WideResNet28-10 +RICAP	2.82（97.18）	0.141	2.85
WideResNet28-10 +Random Erasing	3.18（96.82）	0.114	4.65
WideResNet28-10 +Mixup	3.02（96.98）	0.158	3.02

3.2 Reimplementation结果

Model	Error rate	Loss	Error rate (paper)
WideResNet28-10 baseline	3.78（96.22）		3.89
WideResNet28-10 +RICAP	2.81（97.19）		2.85
WideResNet28-10 +Random Erasing	3.03（96.97）	0.113	4.65
WideResNet28-10 +Mixup	2.93（97.07）	0.158	3.02

3.3 Half data快速训练验证各网络结构

reimplementation models(no augmentation, half data，epoch200，bs128)

Model	Error rate	Loss
lenet(cpu爆炸)	（70.76）
wideresnet	3.78（96.22）
resnet20	（89.72）
senet	（92.34）
resnet18	（92.08）
resnet34	（92.48）
resnet50	（91.72）
regnet	（92.58）
nasnet	out of mem
shake_resnet26_2x32d	（93.06）
shake_resnet26_2x64d	（94.14）
densenet	（92.06）
dla	（92.58）
googlenet	（91.90）	0.2675
efficientnetb0(利用率低且慢)	（86.82）	0.5024
mobilenet(利用率低)	（89.18）
mobilenetv2	（91.06）
pnasnet	（90.44）
preact_resnet	（90.76）
resnext	（92.30）
vgg(cpugpu利用率都高)	（88.38）
inceptionv3	（91.84）
inceptionv4	（91.10）
inception_resnet_v2	（83.46）
rir	（92.34）	0.3932
squeezenet(CPU利用率高)	（89.16）	0.4311
stochastic_depth_resnet18	（90.22）
xception
dpn	（92.06）	0.3002
ge_resnext29_8x64d	（93.86）	巨慢

3.4 测试cpu gpu影响

TEST: scale/kernel ToyNet

修改网络的卷积层深度，并进行训练，可以得到以下结论：

结论：lenet这种卷积量比较少，只有两层的，cpu利用率高，gpu利用率低。在这个基础上增加深度，用vgg那种直筒方式增加深度，发现深度越深，cpu利用率越低，gpu利用率越高。

修改训练过程的batch size，可以得到以下结论：

结论：bs会影响收敛效果。

3.5 StepLR优化下测试cutout和mixup

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	200			96.33
shake_resnet26_2x64d	200	√		96.99
shake_resnet26_2x64d	200		√	96.60
shake_resnet26_2x64d	200	√	√	96.46

3.6 测试SAM,ASAM,Cosine,LabelSmooth

architecture	epoch	SAM	ASAM	Cosine LR Decay	LabelSmooth	C10 test acc (%)
shake_resnet26_2x64d	200	√				96.51
shake_resnet26_2x64d	200		√			96.80
shake_resnet26_2x64d	200			√		96.61
shake_resnet26_2x64d	200				√	96.57

PS:其他库在加长训练过程（epoch=1800）情况下可以实现 shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!

3.7 测试cosine lr + shake

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	300			96.66
shake_resnet26_2x64d	300	√		97.21
shake_resnet26_2x64d	300		√	96.90
shake_resnet26_2x64d	300	√	√	96.73

1800 epoch CIFAR ZOO中结果，由于耗时过久，未进行复现。

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	1800			96.94（cifar zoo）
shake_resnet26_2x64d	1800	√		97.20（cifar zoo）
shake_resnet26_2x64d	1800		√	97.42（cifar zoo）
shake_resnet26_2x64d	1800	√	√	97.71（cifar zoo）

3.8 Divide and Co-training方案研究

lr:
- warmup (20 epoch)
- cosine lr decay
- lr=0.1
- total epoch(300 epoch)
bs=128
aug:
- Random Crop and resize
- Random left-right flipping
- AutoAugment
- Normalization
- Random Erasing
- Mixup
weight decay=5e-4 (bias and bn undecayed)
kaiming weight init
optimizer: nesterov

复现：((v100:gpu1) 4min*300/60=20h) top1: 97.59% 本项目目前最高值。

python train.py --model 'pyramidnet272' \
                --name 'divide-co-train' \
                --autoaugmentation True \ 
                --random-erase True \
                --mixup True \
                --epochs 300 \
                --sched 'warmcosine' \
                --optims 'nesterov' \
                --bs 128 \
                --root '/home/dpj/project/data'

3.9 测试多种数据增强

architecture	epoch	cutout	mixup	autoaugment	random-erase	C10 test acc (%)
shake_resnet26_2x64d	200					96.42
shake_resnet26_2x64d	200	√				96.49
shake_resnet26_2x64d	200		√			96.17
shake_resnet26_2x64d	200			√		96.25
shake_resnet26_2x64d	200				√	96.20
shake_resnet26_2x64d	200	√	√			95.82
shake_resnet26_2x64d	200	√		√		96.02
shake_resnet26_2x64d	200	√			√	96.00
shake_resnet26_2x64d	200		√	√		95.83
shake_resnet26_2x64d	200		√		√	95.89
shake_resnet26_2x64d	200			√	√	96.25

python train.py --model 'shake_resnet26_2x64d' --name 'ss64_orgin' --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_c' --cutout True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_m' --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_a' --autoaugmentation True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_r' --random-erase True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cm'  --cutout True --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ca' --cutout True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cr' --cutout True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ma' --mixup True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_mr' --mixup True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ar' --autoaugmentation True --random-erase True  --bs 64

4. Reference

[1] https://github.com/BIGBALLON/CIFAR-ZOO

[2] https://github.com/pprp/MutableNAS

[3] https://github.com/clovaai/CutMix-PyTorch

[4] https://github.com/4uiiurz1/pytorch-ricap

[5] https://github.com/NUDTNASLab/pytorch-image-models

[6] https://github.com/facebookresearch/LaMCTS

[7] https://github.com/Alibaba-MIIL/ImageNet21K

SOTA model in CIFAR10

Related tags

Overview

A PyTorch Implementation of CIFAR Tricks

0. Requirements

1. Implements

1.1 Tricks

1.2 Augmentation

2. Training

2.1 CIFAR-10训练示例

3. Results

3.1 原pytorch-ricap的结果

3.2 Reimplementation结果

3.3 Half data快速训练验证各网络结构

3.4 测试cpu gpu影响

3.5 StepLR优化下测试cutout和mixup

3.6 测试SAM,ASAM,Cosine,LabelSmooth

3.7 测试cosine lr + shake

3.8 Divide and Co-training方案研究

3.9 测试多种数据增强

4. Reference

Owner

PJDong

Bringing Computer Vision and Flutter together , to build an awesome app !!

A public available dataset for road boundary detection in aerial images

PyTorch implementation of our paper How robust are discriminatively trained zero-shot learning models?

Learning to Initialize Neural Networks for Stable and Efficient Training

A compendium of useful, interesting, inspirational usage of pandas functions, each example will be an ipynb file

Unsupervised Foreground Extraction via Deep Region Competition

PyTorch implementation of neural style randomization for data augmentation

This repo generates the training data and the model for Morpheus-Deblend

Implementation of "JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting"

Official pytorch implementation of the AAAI 2021 paper Semantic Grouping Network for Video Captioning

Source code of generalized shuffled linear regression

Neural Style and MSG-Net

Dcf-game-infrastructure-public - Contains all the components necessary to run a DC finals (attack-defense CTF) game from OOO

Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

BT-Unet: A-Self-supervised-learning-framework-for-biomedical-image-segmentation-using-Barlow-Twins

Spatiotemporal resampling methods for mlr3

🛰️ List of earth observation companies and job sites

On Evaluation Metrics for Graph Generative Models