Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Overview

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity
Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elena Mocanu, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu

https://openreview.net/forum?id=RLtqs6pzj1-

Abstract: The success of deep ensembles on improving predictive performance, uncertainty, and out-of-distribution robustness has been extensively demonstrated in the machine learning literature. Albeit the promising results, naively training multiple deep neural networks and combining their predictions at test lead to prohibitive computational costs and memory requirements. Recently proposed efficient ensemble approaches reach the performance of the traditional deep ensembles with significantly lower costs. However, the training resources required by these approaches are still at least the same as training a single dense model. In this work, we draw a unique connection between sparse neural network training and deep ensembles, yielding a novel efficient ensemble learning framework called FreeTickets. Instead of training multiple dense networks and averaging them, we directly train sparse subnetworks from scratch and extract diverse yet accurate subnetworks during this efficient, sparse-to-sparse training. Our framework, FreeTickets, is defined as the ensemble of these relatively cheap sparse subnetworks. Despite being an ensemble method, FreeTickets has even fewer parameters and training FLOPs compared to a single dense model. This seemingly counter-intuitive outcome is due to the ultra training efficiency of dynamic sparse training. FreeTickets improves over the dense baseline in the following criteria: prediction accuracy, uncertainty estimation, out-of-distribution (OoD) robustness, and training/inference efficiency. Impressively, FreeTickets outperforms the naive deep ensemble with ResNet50 on ImageNet using around only 1/5 training FLOPs required by the latter.

This code base is created by Shiwei Liu [email protected] during his Ph.D. at Eindhoven University of Technology.

Requirements

Python 3.6, PyTorch v1.5.1, and CUDA v10.2.

How to Run Experiments

CIFAR-10/100 Experiments

To train Wide ResNet28-10 on CIFAR10/100 with DST ensemble at sparsity 0.8:

python main_DST.py --sparse --model wrn-28-10 --data cifar10 --seed 17 --sparse-init ERK \
--update-frequency 1000 --batch-size 128 --death-rate 0.5 --large-death-rate 0.8 \
--growth gradient --death magnitude --redistribution none --epochs 250 --density 0.2

To train Wide ResNet28-10 on CIFAR10/100 with EDST ensemble at sparsity 0.8:

python3 main_EDST.py --sparse --model wrn-28-10 --data cifar10 --nolrsche \
--decay-schedule constant --seed 17 --epochs-explo 150 --model-num 3 --sparse-init ERK \
--update-frequency 1000 --batch-size 128 --death-rate 0.5 --large-death-rate 0.8 \
--growth gradient --death magnitude --redistribution none --epochs 450 --density 0.2

[Training module] The training module is controlled by the following arguments:

  • --epochs-explo - An integer that controls the training epochs of the exploration phase.
  • --model-num - An integer, the number free tickets to produce.
  • --large-death-rate - A float, the ratio of parameters to explore for each refine phase.
  • --density - An float, the density (1-sparsity) level for each free ticket.

To train Wide ResNet28-10 on CIFAR10/100 with PF (prung and finetuning) ensemble at sparsity 0.8:

First, we need train a dense model with:

python3 main_individual.py  --model wrn-28-10 --data cifar10 --decay-schedule cosine --seed 18 \
--sparse-init ERK --update-frequency 1000 --batch-size 128 --death-rate 0.5 --large-death-rate 0.5 \
--growth gradient --death magnitude --redistribution none --epochs 250 --density 0.2

Then, perform pruning and finetuning with:

pretrain='results/wrn-28-10/cifar10/individual/dense/18.pt'
python3 main_PF.py --sparse --model wrn-28-10 --resume --pretrain $pretrain --lr 0.001 \
--fix --data cifar10 --nolrsche --decay-schedule constant --seed 18 
--epochs-fs 150 --model-num 3 --sparse-init pruning --update-frequency 1000 --batch-size 128 \
--death-rate 0.5 --large-death-rate 0.8 --growth gradient --death magnitude \
--redistribution none --epochs $epoch --density 0.2

After finish the training of various ensemble methods, run the following commands for test ensemble:

resume=results/wrn-28-10/cifar10/density_0.2/EDST/M=3/
python ensemble_freetickets.py --mode predict --resume $resume --dataset cifar10 --model wrn-28-10 \
--seed 18 --test-batch-size 128
  • --resume - An folder path that contains the all the free tickets obtained during training.
  • --mode - An str that control the evaluation mode, including: predict, disagreement, calibration, KD, and tsne.

ImageNet Experiments

cd ImageNet
python $1multiproc.py --nproc_per_node 2 $1main.py --sparse_init ERK --multiplier 1 --growth gradient --seed 17 --master_port 4545 -j5 -p 500 --arch resnet50 -c fanin --update_frequency 4000 --label-smoothing 0.1 -b 64 --lr 0.1 --warmup 5 --epochs 310 --density 0.2 $2 ../data/

Citation

if you find this repo is helpful, please cite

@inproceedings{
liu2022deep,
title={Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity},
author={Shiwei Liu and Tianlong Chen and Zahra Atashgahi and Xiaohan Chen and Ghada Sokar and Elena Mocanu and Mykola Pechenizkiy and Zhangyang Wang and Decebal Constantin Mocanu},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=RLtqs6pzj1-}
}
Owner
VITA
Visual Informatics Group @ University of Texas at Austin
VITA
Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way

HackerMath for Machine Learning “Study hard what interests you the most in the most undisciplined, irreverent and original manner possible.” ― Richard

Amit Kapoor 1.4k Dec 22, 2022
PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks

AttentionHTR PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks. Scene Text

Dmitrijs Kass 31 Dec 22, 2022
Projecting interval uncertainty through the discrete Fourier transform

Projecting interval uncertainty through the discrete Fourier transform This repo

1 Mar 02, 2022
This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer Capacitor domain using text similarity indexes: An experimental analysis "

kwd-extraction-study This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer

ping 543f 1 Dec 05, 2022
SANet: A Slice-Aware Network for Pulmonary Nodule Detection

SANet: A Slice-Aware Network for Pulmonary Nodule Detection This paper (SANet) has been accepted and early accessed in IEEE TPAMI 2021. This code and

Jie Mei 39 Dec 17, 2022
Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad to your characters in Modo.

Applicator Kit for Modo Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad with a TrueDepth camera to

Andrew Buttigieg 3 Aug 24, 2021
Repository for code and dataset for our EMNLP 2021 paper - “So You Think You’re Funny?”: Rating the Humour Quotient in Standup Comedy.

AI-OpenMic Dataset The dataset is available for download via the follwing link. Repository for code and dataset for our EMNLP 2021 paper - “So You Thi

6 Oct 26, 2022
Expressive Power of Invariant and Equivaraint Graph Neural Networks (ICLR 2021)

Expressive Power of Invariant and Equivaraint Graph Neural Networks In this repository, we show how to use powerful GNN (2-FGNN) to solve a graph alig

Marc Lelarge 36 Dec 12, 2022
Official implementation of the method ContIG, for self-supervised learning from medical imaging with genomics

ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics This is the code implementation of the paper "ContIG: Self-s

Digital Health & Machine Learning 22 Dec 13, 2022
Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation [OpenReview] [arXiv] [Code] The official implementation of GeoDiff: A Geome

Minkai Xu 155 Dec 26, 2022
Code samples for my book "Neural Networks and Deep Learning"

Code samples for "Neural Networks and Deep Learning" This repository contains code samples for my book on "Neural Networks and Deep Learning". The cod

Michael Nielsen 13.9k Dec 26, 2022
Segmentation models with pretrained backbones. PyTorch.

Python library with Neural Networks for Image Segmentation based on PyTorch. The main features of this library are: High level API (just two lines to

Pavel Yakubovskiy 6.6k Jan 06, 2023
LSUN Dataset Documentation and Demo Code

LSUN Please check LSUN webpage for more information about the dataset. Data Release All the images in one category are stored in one lmdb database fil

Fisher Yu 426 Jan 02, 2023
Parris, the automated infrastructure setup tool for machine learning algorithms.

README Parris, the automated infrastructure setup tool for machine learning algorithms. What Is This Tool? Parris is a tool for automating the trainin

Joseph Greene 319 Aug 02, 2022
Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

polyaxon 3.2k Dec 23, 2022
Parameterising Simulated Annealing for the Travelling Salesman Problem

Parameterising Simulated Annealing for the Travelling Salesman Problem

Gary Sun 55 Jun 15, 2022
coldcuts is an R package to automatically generate and plot segmentation drawings in R

coldcuts coldcuts is an R package that allows you to draw and plot automatically segmentations from 3D voxel arrays. The name is inspired by one of It

2 Sep 03, 2022
Dogs classification with Deep Metric Learning using some popular losses

Tsinghua Dogs classification with Deep Metric Learning 1. Introduction Tsinghua Dogs dataset Tsinghua Dogs is a fine-grained classification dataset fo

QuocThangNguyen 45 Nov 09, 2022
[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Grounded Situation Recognition with Transformers Paper | Model Checkpoint This is the official PyTorch implementation of Grounded Situation Recognitio

Junhyeong Cho 18 Jul 19, 2022
Nodule Generation Algorithm Baseline and template code for node21 generation track

Nodule Generation Algorithm This codebase implements a simple baseline model, by following the main steps in the paper published by Litjens et al. for

node21challenge 10 Apr 21, 2022