A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

Overview

Bag of tricks for long-tailed visual recognition with deep convolutional neural networks

This repository is the official PyTorch implementation of AAAI-21 paper Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks, which provides practical and effective tricks used in long-tailed image classification.

Trick gallery: trick_gallery.md

  • The tricks will be constantly updated. If you have or need any long-tail related trick newly proposed, please to open an issue or pull requests. Make sure to attach the results in corresponding md files if you pull a request with a new trick.
  • For any problem, such as bugs, feel free to open an issue.

Paper collection of long-tailed visual recognition

Awesome-of-Long-Tailed-Recognition

Long-Tailed-Classification-Leaderboard

Development log

Trick gallery and combinations

Brief inroduction

We divided the long-tail realted tricks into four families: re-weighting, re-sampling, mixup training, and two-stage training. For more details of the above four trick families, see the original paper.

Detailed information :

  • Trick gallery:

    Tricks, corresponding results, experimental settings, and running commands are listed in trick_gallery.md.
  • Trick combinations:

    Combinations of different tricks, corresponding results, experimental settings, and running commands are listed in trick_combination.md.
  • These tricks and trick combinations, which provide the corresponding results in this repo, have been reorgnized and tested. We are trying our best to deal with the rest, which will be constantly updated.

Main requirements

torch >= 1.4.0
torchvision >= 0.5.0
tensorboardX >= 2.1
tensorflow >= 1.14.0 #convert long-tailed cifar datasets from tfrecords to jpgs
Python 3
apex
  • We provide the detailed requirements in requirements.txt. You can run pip install requirements.txt to create the same running environment as ours.
  • The apex is recommended to be installed for saving GPU memories:
pip install -U pip
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
  • If the apex is not installed, the Distributed training with DistributedDataParallel in our codes cannot be used.

Preparing the datasets

We provide three datasets in this repo: long-tailed CIFAR (CIFAR-LT), long-tailed ImageNet (ImageNet-LT), and iNaturalist 2018 (iNat18).

The detailed information of these datasets are shown as follows:

Datasets CIFAR-10-LT CIFAR-100-LT ImageNet-LT iNat18
Imbalance factor
100 50 100 50
Training images 12,406 13,996 10,847 12,608 11,5846 437,513
Classes 50 50 100 100 1,000 8,142
Max images 5,000 5,000 500 500 1,280 1,000
Min images 50 100 5 10 5 2
Imbalance factor 100 50 100 50 256 500
-  `Max images` and `Min images` represents the number of training images in the largest and smallest classes, respectively.

-  CIFAR-10-LT-100 means the long-tailed CIFAR-10 dataset with the imbalance factor $\beta = 100$.

-  Imbalance factor is defined as $\beta = \frac{\text{Max images}}{\text{Min images}}$.

  • Data format

The annotation of a dataset is a dict consisting of two field: annotations and num_classes. The field annotations is a list of dict with image_id, fpath, im_height, im_width and category_id.

Here is an example.

{
    'annotations': [
                    {
                        'image_id': 1,
                        'fpath': '/data/iNat18/images/train_val2018/Plantae/7477/3b60c9486db1d2ee875f11a669fbde4a.jpg',
                        'im_height': 600,
                        'im_width': 800,
                        'category_id': 7477
                    },
                    ...
                   ]
    'num_classes': 8142
}
  • CIFAR-LT

    There are two versions of CIFAR-LT.

    1. Cui et al., CVPR 2019 firstly proposed the CIFAR-LT. They provided the download link of CIFAR-LT, and also the codes to generate the data, which are in TensorFlow.

      You can follow the steps below to get this version of CIFAR-LT:

      1. Download the Cui's CIFAR-LT in GoogleDrive or Baidu Netdisk (password: 5rsq). Suppose you download the data and unzip them at path /downloaded/data/.
      2. Run tools/convert_from_tfrecords, and the converted CIFAR-LT and corresponding jsons will be generated at /downloaded/converted/.
    # Convert from the original format of CIFAR-LT
    python tools/convert_from_tfrecords.py  --input_path /downloaded/data/ --out_path /downloaded/converted/
    1. Cao et al., NeurIPS 2019 followed Cui et al., CVPR 2019's method to generate the CIFAR-LT randomly. They modify the CIFAR datasets provided by PyTorch as this file shows.
  • ImageNet-LT

    You can use the following steps to convert from the original images of ImageNet-LT.

    1. Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path /downloaded/ImageNet/, which should contain two sub-directories: /downloaded/ImageNet/train and /downloaded/ImageNet/val.
    2. Download the train/test splitting files (ImageNet_LT_train.txt and ImageNet_LT_test.txt) in GoogleDrive or Baidu Netdisk (password: cj0g). Suppose you have downloaded them at path /downloaded/ImageNet-LT/.
    3. Run tools/convert_from_ImageNet.py, and you will get two jsons: ImageNet_LT_train.json and ImageNet_LT_val.json.
    # Convert from the original format of ImageNet-LT
    python tools/convert_from_ImageNet.py --input_path /downloaded/ImageNet-LT/ --image_path /downloaed/ImageNet/ --output_path ./
  • iNat18

    You can use the following steps to convert from the original format of iNaturalist 2018.

    1. The images and annotations should be downloaded at iNaturalist 2018 firstly. Suppose you have downloaded them at path /downloaded/iNat18/.
    2. Run tools/convert_from_iNat.py, and use the generated iNat18_train.json and iNat18_val.json to train.
    # Convert from the original format of iNaturalist
    # See tools/convert_from_iNat.py for more details of args 
    python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/train2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_train.json
    
    python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/val2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_val.json 

Usage

In this repo:

  • The results of CIFAR-LT (ResNet-32) and ImageNet-LT (ResNet-10), which need only one GPU to train, are gotten by DataParallel training with apex.

  • The results of iNat18 (ResNet-50), which need more than one GPU to train, are gotten by DistributedDataParallel training with apex.

  • If more than one GPU is used, DistributedDataParallel training is efficient than DataParallel training, especially when the CPU calculation forces are limited.

Training

Parallel training with DataParallel

1, To train
# To train long-tailed CIFAR-10 with imbalanced ratio of 50. 
# `GPUs` are the GPUs you want to use, such as `0,4`.
bash data_parallel_train.sh configs/test/data_parallel.yaml GPUs

Distributed training with DistributedDataParallel

1, Change the NCCL_SOCKET_IFNAME in run_with_distributed_parallel.sh to [your own socket name]. 
export NCCL_SOCKET_IFNAME = [your own socket name]

2, To train
# To train long-tailed CIFAR-10 with imbalanced ratio of 50. 
# `GPUs` are the GPUs you want to use, such as `0,1,4`.
# `NUM_GPUs` are the number of GPUs you want to use. If you set `GPUs` to `0,1,4`, then `NUM_GPUs` should be `3`.
bash distributed_data_parallel_train.sh configs/test/distributed_data_parallel.yaml NUM_GPUs GPUs

Validation

You can get the validation accuracy and the corresponding confusion matrix after running the following commands.

See main/valid.py for more details.

1, Change the TEST.MODEL_FILE in the yaml to your own path of the trained model firstly.
2, To do validation
# `GPUs` are the GPUs you want to use, such as `0,1,4`.
python main/valid.py --cfg [Your yaml] --gpus GPUS

The comparison between the baseline results using our codes and the references [Cui, Kang]

  • We use Top-1 error rates as our evaluation metric.
  • From the results of two CIFAR-LT, we can see that the CIFAR-LT provided by Cao has much lower Top-1 error rates on CIFAR-10-LT, compared with the baseline results reported in his paper. So, in our experiments, we use the CIFAR-LT of Cui for fairness.
  • For the ImageNet-LT, we find that the color_jitter augmentation was not included in our experiments, which, however, is adopted by other methods. So, in this repo, we add the color_jitter augmentation on ImageNet-LT. The old baseline without color_jitter is 64.89, which is +1.15 points higher than the new baseline.
  • You can click the Baseline in the table below to see the experimental settings and corresponding running commands.
Datasets Cui et al., 2019 Cao et al., 2020 ImageNet-LT iNat18
CIFAR-10-LT CIFAR-100-LT CIFAR-10-LT CIFAR-100-LT
Imbalance factor Imbalance factor
100 50 100 50 100 50 100 50
Backbones ResNet-32 ResNet-32 ResNet-10 ResNet-50
Baselines using our codes
  1. CONFIG (from left to right):
    • configs/cui_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml}
    • configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml}
    • configs/ImageNet_LT/imagenetlt_baseline.yaml
    • configs/iNat18/iNat18_baseline.yaml

  2. Running commands:
    • For CIFAR-LT and ImageNet-LT: bash data_parallel_train.sh CONFIG GPU
    • For iNat18: bash distributed_data_parallel_train.sh configs/iNat18/iNat18_baseline.yaml NUM_GPUs GPUs
30.12 24.81 61.76 57.65 28.05 23.55 62.27 56.22 63.74 40.55
Reference [Cui, Kang, Liu] 29.64 25.19 61.68 56.15 29.64 25.19 61.68 56.15 64.40 42.86

Citation

@inproceedings{zhang2020tricks,
  author    = {Yongshun Zhang and Xiu{-}Shen Wei and Boyan Zhou and Jianxin Wu},
  title     = {Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks},
  booktitle = {AAAI},
  year      = {2021},
}

Contacts

If you have any question about our work, please do not hesitate to contact us by emails provided in the paper.

Owner
Yong-Shun Zhang
Computer Vision
Yong-Shun Zhang
NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

NeuralWOZ This code is official implementation of "NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation". Sungdong Kim, Mi

NAVER AI 31 Oct 25, 2022
Learning Representational Invariances for Data-Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances

Virginia Tech Vision and Learning Lab 27 Nov 22, 2022
Official implementation for the paper: Permutation Invariant Graph Generation via Score-Based Generative Modeling

Permutation Invariant Graph Generation via Score-Based Generative Modeling This repo contains the official implementation for the paper Permutation In

64 Dec 29, 2022
Automatically creates genre collections for your Plex media

Plex Auto Genres Plex Auto Genres is a simple script that will add genre collection tags to your media making it much easier to search for genre speci

Shane Israel 63 Dec 31, 2022
PyTorch implementation of ENet

PyTorch-ENet PyTorch (v1.1.0) implementation of ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, ported from the lua-torc

David Silva 333 Dec 29, 2022
A library for hidden semi-Markov models with explicit durations

hsmmlearn hsmmlearn is a library for unsupervised learning of hidden semi-Markov models with explicit durations. It is a port of the hsmm package for

Joris Vankerschaver 69 Dec 20, 2022
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au

14 Nov 28, 2022
An easy-to-use app to visualise attentions of various VQA models.

Ask Me Anything: A tool for visualising Visual Question Answering (AMA) An easy-to-use app to visualise attentions of various VQA models. Please click

Apoorve 37 Nov 13, 2022
SBINN: Systems-biology informed neural network

SBINN: Systems-biology informed neural network The source code for the paper M. Daneker, Z. Zhang, G. E. Karniadakis, & L. Lu. Systems biology: Identi

Lu Group 15 Nov 19, 2022
PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning This repository is for EMSRDPN introduced in the foll

7 Feb 10, 2022
Code for classifying international patents based on the text of their titles/abstracts

Patent Classification Goal: To train a machine learning classifier that can automatically classify international patents downloaded from the WIPO webs

Prashanth Rao 1 Nov 08, 2022
Source code for the paper "Periodic Traveling Waves in an Integro-Difference Equation With Non-Monotonic Growth and Strong Allee Effect"

Source code for the paper "Periodic Traveling Waves in an Integro-Difference Equation With Non-Monotonic Growth and Strong Allee Effect" by Michael Ne

M Nestor 1 Apr 19, 2022
Pre-Trained Image Processing Transformer (IPT)

Pre-Trained Image Processing Transformer (IPT) By Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Cha

HUAWEI Noah's Ark Lab 332 Dec 18, 2022
YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

Yolo v4, v3 and v2 for Windows and Linux (neural networks for object detection) Paper YOLO v4: https://arxiv.org/abs/2004.10934 Paper Scaled YOLO v4:

Alexey 20.2k Jan 09, 2023
Hyperparameter Optimization for TensorFlow, Keras and PyTorch

Hyperparameter Optimization for Keras Talos • Key Features • Examples • Install • Support • Docs • Issues • License • Download Talos radically changes

Autonomio 1.6k Dec 15, 2022
Fermi Problems: A New Reasoning Challenge for AI

Fermi Problems: A New Reasoning Challenge for AI Fermi Problems are questions whose answer is a number that can only be reasonably estimated as a prec

AI2 15 May 28, 2022
Official code for MPG2: Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

This is the official code for Multi-attribute Pizza Generator (MPG2): Cross-domain Attribute Control with Conditional StyleGAN. Paper Demo Setup Envir

Fangda Han 5 Sep 01, 2022
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022
Multiple-criteria decision-making (MCDM) with Electre, Promethee, Weighted Sum and Pareto

EasyMCDM - Quick Installation methods Install with PyPI Once you have created your Python environment (Python 3.6+) you can simply type: pip3 install

Labrak Yanis 6 Nov 22, 2022