Sharpness-Aware Minimization for Efficiently Improving Generalization

Last update: Dec 08, 2022

Overview

Sharpness-Aware-Minimization-TensorFlow

This repository provides a minimal implementation of sharpness-aware minimization (SAM) (Sharpness-Aware Minimization for Efficiently Improving Generalization) in TensorFlow 2. SAM is motivated by the connections between the geometry of the loss landscape of deep neural networks and their generalization ability. SAM attempts to simultaneously minimize loss value as well as loss curvature thereby seeking parameters in neighborhoods having uniformly low loss value. This is indeed different from traditional SGD-based optimization that seeks parameters having low loss values on an individual basis. The figure below (taken from the original paper) demonstrates the effects of using SAM -

My goal with this repository is to be able to quickly train neural networks with and without SAM. All the experiments are shown in the SAM.ipynb notebook (). The notebook is end-to-end executable on Google Colab. Furthermore, they utilize the free TPUs (TPUv2-8) Google Colab provides allowing readers to experiment very quickly.

Notes

Before moving to the findings, please be aware of the following notable differences in my implementation:

ResNet20 (attributed to this repository) is used as opposed to PyramidNet and WideResNet.
ShakeDrop regularization has not been used.
Two simple augmentation transformations (random crop and random brightness) have been used as opposed to Cutout, AutoAugment.
Adam has been used as the optimizer with the default arguments as provided by TensorFlow with a ReduceLROnPlateau. Table 1 of the original paper suggests using SGD with different configurations.
Instead of training for full number of epochs I used early stopping with a patience of 10.

SAM has only one hyperparameter namely rho that controls the neighborhood of the parameter space. In my experiments, it's defaulted to 0.05. For other details related to training configuration (i.e. network depth, learning rate, batch size, etc.) please refer to the notebooks.

Findings

	Number of Parameters (million)	Final Test Accuracy (%)
With SAM	0.575114	80.5
Without SAM	0.575114	83.1

Acknowledgements

David Samuel's PyTorch implementation

Sharpness-Aware Minimization for Efficiently Improving Generalization

Related tags

Overview

Sharpness-Aware-Minimization-TensorFlow

Notes

Findings

Acknowledgements

Owner

Sayak Paul

This is the official Pytorch implementation of the paper "Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model"

Official implementation of Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models at NeurIPS 2021

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Code for "Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks", CVPR 2021

[ICML 2021] A fast algorithm for fitting robust decision trees.

This program was designed to detect whether someone is wearing a facemask through a live video stream.

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

PyTorch 1.0 inference in C++ on Windows10 platforms

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021)

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Code accompanying "Adaptive Methods for Aggregated Domain Generalization"

🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗

Efficient 3D Backbone Network for Temporal Modeling

Code for "Adversarial attack by dropping information." (ICCV 2021)

pyspark🍒🥭 is delicious，just eat it!😋😋

NEO: Non Equilibrium Sampling on the orbit of a deterministic transform

Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies

TensorFlow 101: Introduction to Deep Learning for Python Within TensorFlow

Existing Literature about Machine Unlearning