Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)

Last update: Jan 04, 2023

Related tags

Overview

Automated Learning Rate Scheduler for Large-Batch Training

The official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML).

Overview

AutoWU is an automated LR scheduler which consists of two phases: warmup and decay. Learning rate (LR) is increased in an exponential rate until the loss starts to increase, and in the decay phase LR is decreased following the pre-specified type of the decay (either cosine or constant-then-cosine, in our experiments).

Transition from the warmup to the decay phase is done automatically by testing whether the minimum of the predicted loss curve is attained in the past or not with high probability, and the prediction is made via Gaussian Process regression.

How to use

Setup

pip install -r requirements.txt

Quick use

You can use AutoWU as other PyTorch schedulers, except that it takes loss as an argument (like ReduceLROnPlateau in PyTorch). The following code snippet demonstrates a typical usage of AutoWU.

from autowu import AutoWU

...

scheduler = AutoWU(optimizer,
                   len(train_loader),  # the number of steps in one epoch 
                   total_epochs,  # total number of epochs
                   immediate_cooldown=True,
                   cooldown_type='cosine',
                   device=device)

...

for _ in range(total_epochs):
    for inputs, targets in train_loader:
        loss = loss_fn(model(inputs), targets)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        scheduler.step(loss)

The default decay phase schedule is ''cosine''. To use constant-then-cosine schedule rather than cosine, set immediate_cooldown=False and set cooldown_fraction to a desired value:

scheduler = AutoWU(optimizer,
                   len(train_loader),  # the number of steps in one epoch 
                   total_epochs,  # total number of epochs
                   immediate_cooldown=False,
                   cooldown_type='cosine',
                   cooldown_fraction=0.2,  # fraction of cosine decay at the end
                   device=device)

Reproduction of results

We provide an exemplar training script train.py which is based on Pytorch Image Models. The script supports training ResNet-50 and EfficientNet-B0 on ImageNet classification under the setting almost identical to the paper. We report the top-1 accuracy of ResNet-50 and EfficientNet-B0 on the validation set trained with batch sizes 4K (4096) and 16K (16384), along with the scores reported in our paper.

ResNet-50	This repo.	Reported (paper)
4K	75.54%	75.70%
16K	74.87%	75.22%

EfficientNet-B0	This repo.	Reported (paper)
4K	75.74%	75.81%
16K	75.66%	75.44%

You can use distributed.launch util to run the script. For instance, in case of ResNet-50 training with batch size 4096, execute the following line with variables set according to your environment:

python -m torch.distributed.launch \
--nproc_per_node=4 \
--nnodes=4 \
--node_rank=$NODE_RANK \
--master_addr=$MASTER_ADDR \
--master_port=$MASTER_PORT \
train.py \
--data-root $DATA_ROOT \
--amp \
--batch-size 256

In addition, add --model efficientnet_b0 argument in case of EfficientNet-B0 training.

Citation

@inproceedings{
    kim2021automated,
    title={Automated Learning Rate Scheduler for Large-batch Training},
    author={Chiheon Kim and Saehoon Kim and Jongmin Kim and Donghoon Lee and Sungwoong Kim},
    booktitle={8th ICML Workshop on Automated Machine Learning (AutoML)},
    year={2021},
    url={https://openreview.net/forum?id=ljIl7KCNYZH}
}

Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)

Related tags

Overview

Automated Learning Rate Scheduler for Large-Batch Training

Overview

How to use

Setup

Quick use

Reproduction of results

Citation

License

Owner

Kakao Brain

MoCoPnet - Deformable 3D Convolution for Video Super-Resolution

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

Reliable probability face embeddings

[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

PyTorch implementations of deep reinforcement learning algorithms and environments

Official repository of my book: "Deep Learning with PyTorch Step-by-Step: A Beginner's Guide"

A naive ROS interface for visualDet3D.

Multivariate Time Series Transformer, public version

Photo2cartoon - 人像卡通化探索项目 (photo-to-cartoon translation project)

MQBench Quantization Aware Training with PyTorch

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology (LMRL Workshop, NeurIPS 2021)

A framework for multi-step probabilistic time-series/demand forecasting models

Bounding Wasserstein distance with couplings

Pipeline code for Sequential-GAM(Genome Architecture Mapping).

FastFace: Lightweight Face Detection Framework

3D Human Pose Machines with Self-supervised Learning

Conditional Gradients For The Approximately Vanishing Ideal