SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

Last update: Nov 07, 2022

Overview

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

This repo contains our codes for the paper "No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models" (ICLR 2022).

Getting Start

Pull and run docker
pytorch/pytorch:1.5.1-cuda10.1-cudnn7-devel
Install requirements
pip install -r requirements.txt

Data and Model

Download data and pre-trained models
./download.sh
Please refer to this link for details on the GLUE benchmark.
Preprocess data
./experiments/glue/prepro.sh
For the most updated data processing details, please refer to the mt-dnn repo.

Fine-tuning Pre-trained Models using SAGE

We provide an example script for fine-tuning a pre-trained BERT-base model on MNLI using Adamax-SAGE:

./scripts/train_mnli_usadamax.sh GPUID

A few notices:

learning_rate and beta3 are two of the most important hyper-parameters. learning_rate that works well for Adamax/AdamW-SAGE is usually 2 to 5 times larger than that works well for Adamax/AdamW, depending on the tasks. beta3 that works well for Adamax/AdamW-SAGE is usually in the range of 0.6 and 0.9, depending on the tasks.
To use AdamW-SAGE, set argument --optim=usadamw. The current codebase only contains the implementation of Adamax-SAGE and AdamW-SAGE. Please refer to module/bert_optim.py for details. Please refer to our paper for integrating SAGE on other optimizers.
To fine-tune a pre-trained RoBERTa-base model, set arguments --init_checkpoint to the model path and set --encoder_type to 2. Other supported models are listed in pretrained_models.py.
To fine-tune on other tasks, set arguments --train_datasets and --test_datasets to the corresponding task names.

Citation

@inproceedings{
liang2022no,
title={No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models},
author={Chen Liang and Haoming Jiang and Simiao Zuo and Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen and Tuo Zhao},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=cuvga_CiVND}
}

Contact Information

For help or issues related to this package, please submit a GitHub issue. For personal questions related to this paper, please contact Chen Liang ([email protected]).

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

Related tags

Overview

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

Getting Start

Data and Model

Fine-tuning Pre-trained Models using SAGE

Citation

Contact Information

Owner

Chen Liang

This repository contains implementations of all Machine Learning Algorithms from scratch in Python. Mathematics required for ML and many projects have also been included.

A PyTorch implementation of "DGC-Net: Dense Geometric Correspondence Network"

Using deep learning model to detect breast cancer.

Official Repsoitory for "Activate or Not: Learning Customized Activation." [CVPR 2021]

A study project using the AA-RMVSNet to reconstruct buildings from multiple images

SMCA replication There are no extra compiled components in SMCA DETR and package dependencies are minimal

BED: A Real-Time Object Detection System for Edge Devices

Canonical Appearance Transformations

Real-world Anomaly Detection in Surveillance Videos- pytorch Re-implementation

Neural Turing Machines (NTM) - PyTorch Implementation

A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal

This is a computer vision based implementation of the popular childhood game 'Hand Cricket/Odd or Even' in python

FS2KToolbox FS2K Dataset Towards the translation between Face

Easy to use Audio Tagging in PyTorch

SemiNAS: Semi-Supervised Neural Architecture Search

A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning

Optimizes image files by converting them to webp while also updating all references.

Official implementation of CVPR2020 paper "Deep Generative Model for Robust Imbalance Classification"

A complete, self-contained example for training ImageNet at state-of-the-art speed with FFCV

🌾 PASTIS 🌾 Panoptic Agricultural Satellite TIme Series