Instance-based label smoothing for improving deep neural networks generalization and calibration

Last update: Aug 13, 2022

Overview

Instance-based Label Smoothing for Neural Networks

Pytorch Implementation of the algorithm.
This repository includes a new proposed method for instance-based label smoothing in neural networks, where the target probability distribution is not uniformly distributed among incorrect classes. Instead, each incorrect class is going to be assigned a target probability that is proportional to the output score of this particular class relative to all the remaining classes for a network trained with vanilla cross-entropy loss on the hard target labels.

The following figure summarizes the idea of our instance-based label smoothing that aims to keep the information about classes similarity structure while training using label smoothing.

Requirements

Python 3.x
pandas
numpy
pytorch

Usage

Datasets

CIFAR10 / CIFAR100 / FashionMNIST

Files Content

The project have a structure as below:

├── Vanilla-cross-entropy.py
├── Label-smoothing.py
├── Instance-based-smoothing.py
├── Models-evaluation.py
├── Network-distillation.py
├── utils
│   ├── data_loader.py
│   ├── utils.py
│   ├── evaluate.py
│   ├── params.json
├── models
│   ├── resnet.py
│   ├── densenet.py
│   ├── inception.py
│   ├── shallownet.py

Vanilla-cross-entropy.py is the file used for training the networks using cross-entropy without label smoothing.
Label-smoothing.py is the file used for training the networks using cross-entropy with standard label smoothing.
Instance-based-smoothing.py is the file used for training the networks using cross-entropy with instance-based label smoothing.
Models-evaluation.py is the file used for evaluation of the trained networks.
Network-distillation.py is the file used for distillation of trained networks into a shallow convolutional network of 5 layers.
models/ includes all the implementations of the different architectures used in our evaluation like ResNet, DenseNet, Inception-V4. Also, the shallow-cnn student network used in distillation experiments.
utils/ includes all utilities functions required for the different models training and evaluation.

Example

python Instance-based-smoothing.py --dataset cifar10 --model resnet18 --num_classes 10

List of Arguments accepted for Codes of Training and Evaluation of Different Models:

--lr type = float, default = 0.1, help = Starting learning rate (A weight decay of $1e^{-4}$ is used).
--tr_size type = float, default = 0.8, help = Size of training set split out of the whole training set (0.2 for validation).
--batch_size type = int, default = 512, help = Batch size of mini-batch training process.
--epochs type = int, default = 100, help = Number of training epochs.
--estop type = int, default = 10, help = Number of epochs without loss improvement leading to early stopping.
--ece_bins type = int, default = 10, help = Number of bins for expected calibration error calculation.
--dataset, type=str, help=Name of dataset to be used (cifar10/cifar100/fashionmnist).
--num_classes type = int, default = 10, help = Number of classes in the dataset.
--model, type=str, help=Name of the model to be trained. eg: resnet18 / resnet50 / inceptionv4 / densetnet (works for FashionMNIST only).

Results

Results of the comparison of different methods on 3 datasets using 4 different architectures are reported in the following table.
The experiments were repeated 3 times, and average $\pm$ stdev of log loss, expected calibration error (ECE), accuracy, distilled student network accuracy and distilled student log loss metrics are reported.

A t-sne visualization for the logits of 3-different classes in CIFAR-10 can be shown below:

Instance-based label smoothing for improving deep neural networks generalization and calibration

Related tags

Overview

Instance-based Label Smoothing for Neural Networks

Requirements

Usage

Datasets

Files Content

List of Arguments accepted for Codes of Training and Evaluation of Different Models:

Results

Owner

Mohamed Maher

A novel pipeline framework for multi-hop complex KGQA task. About the paper title: Improving Multi-hop Embedded Knowledge Graph Question Answering by Introducing Relational Chain Reasoning

Code for the paper "Attention Approximates Sparse Distributed Memory"

DP-CL(Continual Learning with Differential Privacy)

Implement the Pareto Optimizer and pcgrad to make a self-adaptive loss for multi-task

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

Pytorch implementation of few-shot semantic image synthesis

ZEBRA: Zero Evidence Biometric Recognition Assessment

Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

ULMFiT for Genomic Sequence Data

Optical machine for senses sensing using speckle and deep learning

Nest - A flexible tool for building and sharing deep learning modules

Unsupervised Image to Image Translation with Generative Adversarial Networks

Aircraft design optimization made fast through modern automatic differentiation

The final project for "Applying AI to Wearable Device Data" course from "AI for Healthcare" - Udacity.

Official implementation of "Watermarking Images in Self-Supervised Latent-Spaces"

A tool to estimate time varying instantaneous reproduction number during epidemics

Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

📚 A collection of Jupyter notebooks for learning and experimenting with OpenVINO 👓

天勤量化开发包, 期货量化, 实时行情/历史数据/实盘交易

This is the repository for our paper SimpleTrack: Understanding and Rethinking 3D Multi-object Tracking