Implements pytorch code for the Accelerated SGD algorithm.

Overview

AccSGD

This is the code associated with Accelerated SGD algorithm used in the paper On the insufficiency of existing momentum schemes for Stochastic Optimization, selected to appear at ICLR 2018.

Usage:

The code can be downloaded and placed in a given local directory. In a manner similar to using any usual optimizer from the pytorch toolkit, it is also possible to use the AccSGD optimizer with little effort. First, we require importing the optimizer through the following command:

from AccSGD import *

Next, an ASGD optimizer working with a given pytorch model can be invoked using the following command:

optimizer = AccSGD(model.parameters(), lr=0.1, kappa = 1000.0, xi = 10.0)

where, lr is the learning rate, kappa the long step parameter and xi is the statistical advantage parameter.

Guidelines on setting parameters/debugging:

The learning rate lr: lr is set in a manner similar to schemes such as vanilla Stochastic Gradient Descent (SGD)/Standard Momentum (Heavy Ball)/Nesterov's Acceleration. Note that lr is a function of batch size - a rigorous quantification of this phenomenon can be found in the following paper. Such a characterization has been observed in several empirical works.

Long Step kappa: As the networks grow deeper (e.g. with resnets) and when dealing with typically harder datasets such as CIFAR/ImageNet, employing kappa to be 10^4 or more helps. For shallow nets and easier datasets such as MNIST, a typical value of kappa can be set as 10^3 or even 10^2.

Statistical Advantage Parameter xi: xi lies between 1.0 and sqrt(kappa). When large batch sizes (nearly matching batch gradient descent) are used, it is advisable to use xi that is closer to sqrt(kappa). In general, as the batch size increases by a factor of k, increase xi by sqrt(k).

Effective ways to debug:

For Nets with ReLU/ELU type activations:

(--1--) Slower convergence: There are three reasons for this to happen:

  • This could be a result of setting the learning rate too low (similar to SGD/vanilla momentum/Nesterov's acceleration).
  • This could be as a result of setting kappa to be too high.
  • The other reason could be that xi has been set to a small value and needs to be increased.

(--2--) Oscillatory behavior/Divergence: There are two reasons for this to happen:

  • This could be a result of setting the learning rate to be too high (similar to SGD/vanilla momentum/Nesterov's acceleration).
  • The other reason is that xi has been set to a large value and needs to be decreased.

For nets with Sigmoid activations:

Slower convergence after an initial rapid decrease in error: This is a sign of an over aggressive setting of parameters and must be treated in a similar manner as the oscillatory/divergence behavior (--2--) encountered in the ReLU/ELU activation case.

Slow convergence right from the start: This is more likely related to slower convergence (--1--) encountered in the ReLU/ELU case.

Citation:

If AccSGD is used in your paper/experiments, please cite the following papers.

@inproceedings{Kidambi2018Insufficiency,
  title={On the insufficiency of existing momentum schemes for Stochastic Optimization},
  author={Kidambi, Rahul and Netrapalli, Praneeth and Jain, Prateek and Kakade, Sham},
  booktitle={International Conference on Learning Representations},
  year={2018}
}

@Article{Jain2017Accelerating,
  title={Accelerating Stochastic Gradient Descent},
  author={Jain, Prateek and Kakade, Sham and Kidambi, Rahul and Netrapalli, Praneeth and Sidford, Aaron},
  journal={CoRR},
  volume = {abs/1704.08227},
  year={2017}
}
A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

169 Dec 23, 2022
A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

🤗 Accelerate was created for PyTorch users who like to write the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.

Hugging Face 3.5k Jan 08, 2023
lookahead optimizer (Lookahead Optimizer: k steps forward, 1 step back) for pytorch

lookahead optimizer for pytorch PyTorch implement of Lookahead Optimizer: k steps forward, 1 step back Usage: base_opt = torch.optim.Adam(model.parame

Liam 318 Dec 09, 2022
An implementation of Performer, a linear attention-based transformer, in Pytorch

Performer - Pytorch An implementation of Performer, a linear attention-based transformer variant with a Fast Attention Via positive Orthogonal Random

Phil Wang 900 Dec 22, 2022
Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Tacotron 2 (without wavenet) PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This implementati

NVIDIA Corporation 4.1k Jan 03, 2023
PyTorch wrappers for using your model in audacity!

PyTorch wrappers for using your model in audacity!

130 Dec 14, 2022
Training PyTorch models with differential privacy

Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the cli

1.3k Dec 29, 2022
A tutorial on "Bayesian Compression for Deep Learning" published at NIPS (2017).

Code release for "Bayesian Compression for Deep Learning" In "Bayesian Compression for Deep Learning" we adopt a Bayesian view for the compression of

Karen Ullrich 190 Dec 30, 2022
S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.

S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.

Amazon Web Services 138 Jan 03, 2023
Reformer, the efficient Transformer, in Pytorch

Reformer, the Efficient Transformer, in Pytorch This is a Pytorch implementation of Reformer https://openreview.net/pdf?id=rkgNKkHtvB It includes LSH

Phil Wang 1.8k Jan 06, 2023
A very simple and small path tracer written in pytorch meant to be run on the GPU

MentisOculi Pytorch Path Tracer A very simple and small path tracer written in pytorch meant to be run on the GPU Why use pytorch and not some other c

Matthew B. Mirman 222 Dec 01, 2022
TorchSSL: A PyTorch-based Toolbox for Semi-Supervised Learning

TorchSSL: A PyTorch-based Toolbox for Semi-Supervised Learning

1k Dec 28, 2022
PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

Jacob Gildenblat 836 Dec 26, 2022
Use Jax functions in Pytorch with DLPack

Use Jax functions in Pytorch with DLPack

Phil Wang 106 Dec 17, 2022
Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

Pretrained models for Pytorch (Work in progress) The goal of this repo is: to help to reproduce research papers results (transfer learning setups for

Remi 8.7k Dec 31, 2022
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

News March 3: v0.9.97 has various bug fixes and improvements: Bug fixes for NTXentLoss Efficiency improvement for AccuracyCalculator, by using torch i

Kevin Musgrave 5k Jan 02, 2023
PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations

PyTorch Sparse This package consists of a small extension library of optimized sparse matrix operations with autograd support. This package currently

Matthias Fey 757 Jan 04, 2023
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

micrograd A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural

Andrej 3.5k Jan 08, 2023
Riemannian Adaptive Optimization Methods with pytorch optim

geoopt Manifold aware pytorch.optim. Unofficial implementation for “Riemannian Adaptive Optimization Methods” ICLR2019 and more. Installation Make sur

642 Jan 03, 2023
A simplified framework and utilities for PyTorch

Here is Poutyne. Poutyne is a simplified framework for PyTorch and handles much of the boilerplating code needed to train neural networks. Use Poutyne

GRAAL/GRAIL 534 Dec 17, 2022