AdamW optimizer for bfloat16 models in pytorch.

Last update: Nov 20, 2022

Related tags

Deep Learning adamw_bfloat16

Overview

_{Image source}

AdamW optimizer for bfloat16 models in pytorch.

Bfloat16 is currently an optimal tradeoff between range and relative error for deep networks.
Bfloat16 can be used quite efficiently on Nvidia GPUs with Ampere architecture (A100, A10, A30, RTX3090...)

However, neither AMP in pytorch is ready for bfloat16, nor optimizers.

If you just convert all weights and inputs to bfloat16, you're likely to run into an issue of stale weights: updates are too small to modify bfloat16 weight (see gopher paper, section C2 for a large-scale example).

There are two possible remedies:

keep weights in float32 (precise) and bfloat16 (approximate)
keep weights in bfloat16, and keep correction term in bfloat16

As recent study has shown, both options are completely competitive in quality to float32 training.

Usage

Install:

pip install git+https://github.com/arogozhnikov/adamw_bfloat16.git

Use as a drop-in replacement for pytorch's AdamW:

import torch
from adamw_bfloat16 import LR, AdamW_BF16
model = model.to(torch.bfloat16)

# default preheat and decay
optimizer = AdamW_BF16(model.parameters())

# configure LR schedule. Use built-in scheduling opportunity
optimizer = AdamW_BF16(model.parameters(), lr_function=LR(lr=1e-4, preheat_steps=5000, decay_power=-0.25))

Releases(v0.1.0)

v0.1.0(Dec 14, 2021)

Initial implementation of AdamW for pytorch supports cuda graphs and has a built-in mechanism for control of learning rate, because external are unlikely to make a friendship with cuda graphs
Source code(tar.gz)
Source code(zip)

AdamW optimizer for bfloat16 models in pytorch.

Related tags

Overview

AdamW optimizer for bfloat16 models in pytorch.

Usage

You might also like...

Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.

Releases(v0.1.0)

v0.1.0(Dec 14, 2021)

Owner

Alex Rogozhnikov

Aydin is a user-friendly, feature-rich, and fast image denoising tool

Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectrum sensing.

A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

Deep learning model, heat map, data prepo

Some useful blender add-ons for SMPL skeleton's poses and global translation.

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

Public implementation of the Convolutional Motif Kernel Network (CMKN) architecture

Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks

Scenic: A Jax Library for Computer Vision and Beyond

NasirKhusraw - The TSP solved using genetic algorithm and show TSP path overlaid on a map of the Iran provinces & their capitals.

Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model

Graph Transformer Architecture. Source code for

Duke Machine Learning Winter School: Computer Vision 2022

Python SDK for building, training, and deploying ML models

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

🔮 Execution time predictions for deep neural network training iterations across different GPUs.

code for our BMVC 2021 paper "HCV: Hierarchy-Consistency Verification for Incremental Implicitly-Refined Classification"

Changing the Mind of Transformers for Topically-Controllable Language Generation

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families