A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

Last update: Oct 28, 2022

Related tags

Deep Learning GFNet-Pytorch

Overview

GFNet-Pytorch (NeurIPS 2020)

This repo contains the official code and pre-trained models for the glance and focus network (GFNet).

Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classiﬁcation

Citation

@inproceedings{NeurIPS2020_7866,
        title = {Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification},
       author = {Wang, Yulin and Lv, Kangchen and Huang, Rui and Song, Shiji and Yang, Le and Huang, Gao},
    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
         year = {2020},
}

Update on 2020/10/08: Release Pre-trained Models and the Inference Code on ImageNet.

Update on 2020/12/28: Release Training Code.

Introduction

Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efﬁcient image classiﬁcation by processing a sequence of relatively small inputs, which are strategically cropped from the original image. Experiments on ImageNet show that our method consistently improves the computational efﬁciency of a wide variety of deep models. For example, it further reduces the average latency of the highly efﬁcient MobileNet-V3 on an iPhone XS Max by 20% without sacriﬁcing accuracy.

Results

Top-1 accuracy on ImageNet v.s. Multiply-Adds

Top-1 accuracy on ImageNet v.s. Inference Latency (ms) on an iPhone XS Max

Visualization

Pre-trained Models

Backbone CNNs	Patch Size	T	Links
ResNet-50	96x96	5	Tsinghua Cloud / Google Drive
ResNet-50	128x128	5	Tsinghua Cloud / Google Drive
DenseNet-121	96x96	5	Tsinghua Cloud / Google Drive
DenseNet-169	96x96	5	Tsinghua Cloud / Google Drive
DenseNet-201	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-600MF	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-800MF	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-1.6GF	96x96	5	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.00)	96x96	3	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.00)	128x128	3	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.25)	128x128	3	Tsinghua Cloud / Google Drive
EfﬁcientNet-B2	128x128	4	Tsinghua Cloud / Google Drive
EfﬁcientNet-B3	128x128	4	Tsinghua Cloud / Google Drive
EfﬁcientNet-B3	144x144	4	Tsinghua Cloud / Google Drive

What are contained in the checkpoints:

**.pth.tar
├── model_name: name of the backbone CNNs (e.g., resnet50, densenet121)
├── patch_size: size of image patches (i.e., H' or W' in the paper)
├── model_prime_state_dict, model_state_dict, fc, policy: state dictionaries of the four components of GFNets
├── model_flops, policy_flops, fc_flops: Multiply-Adds of inferring the encoder, patch proposal network and classifier for once
├── flops: a list containing the Multiply-Adds corresponding to each length of the input sequence during inference
├── anytime_classification: results of anytime prediction (in Top-1 accuracy)
├── dynamic_threshold: the confidence thresholds used in budgeted batch classification
├── budgeted_batch_classification: results of budgeted batch classification (a two-item list, [0] and [1] correspond to the two coordinates of a curve)

Requirements

python 3.7.7
pytorch 1.3.1
torchvision 0.4.2
pyyaml 5.3.1 (for RegNets)

Evaluate Pre-trained Models

Read the evaluation results saved in pre-trained models

CUDA_VISIBLE_DEVICES=0 python inference.py --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 0

Read the confidence thresholds saved in pre-trained models and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 1

Determine confidence thresholds on the training set and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 2

The dataset is expected to be prepared as follows:

ImageNet
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...

Training

Here we take training ResNet-50 (96x96, T=5) for example. All the used initialization models and stage-1/2 checkpoints can be found in Tsinghua Cloud / Google Drive. Currently, this link includes ResNet and MobileNet-V3. We will update it as soon as possible. If you need other helps, feel free to contact us.
The Results in the paper is based on 2 Tesla V100 GPUs. For most of experiments, up to 4 Titan Xp GPUs may be enough.

Training stage 1, the initializations of global encoder (model_prime) and local encoder (model) are required:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --data_url PATH_TO_DATASET --train_stage 1 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --model_prime_path PATH_TO_CHECKPOINTS  --model_path PATH_TO_CHECKPOINTS

Training stage 2, a stage-1 checkpoint is required:

CUDA_VISIBLE_DEVICES=0 python train.py --data_url PATH_TO_DATASET --train_stage 2 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --checkpoint_path PATH_TO_CHECKPOINTS

Training stage 3, a stage-2 checkpoint is required:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --data_url PATH_TO_DATASET --train_stage 3 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --checkpoint_path PATH_TO_CHECKPOINTS

Contact

If you have any question, please feel free to contact the authors. Yulin Wang: [email protected].

Acknowledgment

Our code of MobileNet-V3 and EfficientNet is from here. Our code of RegNet is from here.

To Do

Update the code for visualizing.
Update the code for MIXED PRECISION TRAINING。

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

Related tags

Overview

GFNet-Pytorch (NeurIPS 2020)

Introduction

Results

Pre-trained Models

Requirements

Evaluate Pre-trained Models

Training

Contact

Acknowledgment

To Do

Owner

Rainforest Wang

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

NLU Dataset Diagnostics

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

PyTorch implementation of the Flow Gaussian Mixture Model (FlowGMM) model from our paper

Implementation of Monocular Direct Sparse Localization in a Prior 3D Surfel Map (DSL)

MAME is a multi-purpose emulation framework.

[NeurIPS 2019] Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

Official Pytorch implementation of "DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network" (CVPR'21)

HyDiff: Hybrid Differential Software Analysis

DiSECt: Differentiable Simulator for Robotic Cutting

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

VQMIVC - Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

AAAI 2022: Stationary diffusion state neural estimation

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

StyleGAN2 Webtoon / Anime Style Toonify

A deep learning object detector framework written in Python for supporting Land Search and Rescue Missions.

JittorVis - Visual understanding of deep learning models

Six - a Python 2 and 3 compatibility library