Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Last update: Jan 02, 2023

Overview

TailCalibX : Feature Generation for Long-tail Classification

by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi

🐣 Easy Usage (Recommended way to use our method)
- 💻 Installation
- 👨‍💻 Example Code
🧪 Advanced Usage
🏋️‍♂️ Trained weights
🪀 Results on a Toy Dataset
🌴 Directory Tree
📃 Citation
👁 Contributing
❤ About me
✨ Extras
📝 License

🐣 Easy Usage (Recommended way to use our method)

⚠ Caution: TailCalibX is just TailCalib employed multiple times. Specifically, we generate a set of features once every epoch and use them to train the classifier. In order to mimic that, three things must be done at every epoch in the following order:

Collect all the features from your dataloader.
Use the tailcalib package to make the features balanced by generating samples.
Train the classifier.
Repeat.

💻 Installation

Use the package manager pip to install tailcalib.

pip install tailcalib

👨‍💻 Example Code

Check the instruction here for a much more detailed python package information.

# Import
from tailcalib import tailcalib

# Initialize
a = tailcalib(base_engine="numpy")   # Options: "numpy", "pytorch"

# Imbalanced random fake data
import numpy as np
X = np.random.rand(200,100)
y = np.random.randint(0,10, (200,))

# Balancing the data using "tailcalib"
feat, lab, gen = a.generate(X=X, y=y)

# Output comparison
print(f"Before: {np.unique(y, return_counts=True)}")
print(f"After: {np.unique(lab, return_counts=True)}")

🧪 Advanced Usage

✔ Things to do before you run the code from this repo

Change the data_root for your dataset in main.py.
If you are using wandb logging (Weights & Biases), make sure to change the wandb.init in main.py accordingly.

📀 How to use?

For just the methods proposed in this paper :
- For CIFAR100-LT: run_TailCalibX_CIFAR100-LT.sh
- For mini-ImageNet-LT : run_TailCalibX_mini-ImageNet-LT.sh
For all the results show in the paper :
- For CIFAR100-LT: run_all_CIFAR100-LT.sh
- For mini-ImageNet-LT : run_all_mini-ImageNet-LT.sh

📚 How to create the mini-ImageNet-LT dataset?

Check Notebooks/Create_mini-ImageNet-LT.ipynb for the script that generates the mini-ImageNet-LT dataset with varying imbalance ratios and train-test-val splits.

⚙ Arguments

--seed : Select seed for fixing it.
- Default : 1
--gpu : Select the GPUs to be used.
- Default : "0,1,2,3"
--experiment: Experiment number (Check 'libs/utils/experiment_maker.py').
- Default : 0.1
--dataset : Dataset number.
- Choices : 0 - CIFAR100, 1 - mini-imagenet
- Default : 0
--imbalance : Select Imbalance factor.
- Choices : 0: 1, 1: 100, 2: 50, 3: 10
- Default : 1
--type_of_val : Choose which dataset split to use.
- Choices: "vt": val_from_test, "vtr": val_from_train, "vit": val_is_test
- Default : "vit"
--cv1 to --cv9 : Custom variable to use in experiments - purpose changes according to the experiment.
- Default : "1"
--train : Run training sequence
- Default : False
--generate : Run generation sequence
- Default : False
--retraining : Run retraining sequence
- Default : False
--resume : Will resume from the 'latest_model_checkpoint.pth' and wandb if applicable.
- Default : False
--save_features : Collect feature representations.
- Default : False
--save_features_phase : Dataset split of representations to collect.
- Choices : "train", "val", "test"
- Default : "train"
--config : If you have a yaml file with appropriate config, provide the path here. Will override the 'experiment_maker'.
- Default : None

🏋️‍♂️ Trained weights

Experiment	CIFAR100-LT (ResNet32, seed 1, Imb 100)	mini-ImageNet-LT (ResNeXt50)
TailCalib	Git-LFS	Git-LFS
TailCalibX	Git-LFS	Git-LFS
CBD + TailCalibX	Git-LFS	Git-LFS

🪀 Results on a Toy Dataset

The higher the Imb ratio, the more imbalanced the dataset is. Imb ratio = maximum_sample_count / minimum_sample_count.

Check this notebook to play with the toy example from which the plot below was generated.

🌴 Directory Tree

TailCalibX
├── libs
│   ├── core
│   │   ├── ce.py
│   │   ├── core_base.py
│   │   ├── ecbd.py
│   │   ├── modals.py
│   │   ├── TailCalib.py
│   │   └── TailCalibX.py
│   ├── data
│   │   ├── dataloader.py
│   │   ├── ImbalanceCIFAR.py
│   │   └── mini-imagenet
│   │       ├── 0.01_test.txt
│   │       ├── 0.01_train.txt
│   │       └── 0.01_val.txt
│   ├── loss
│   │   ├── CosineDistill.py
│   │   └── SoftmaxLoss.py
│   ├── models
│   │   ├── CosineDotProductClassifier.py
│   │   ├── DotProductClassifier.py
│   │   ├── ecbd_converter.py
│   │   ├── ResNet32Feature.py
│   │   ├── ResNext50Feature.py
│   │   └── ResNextFeature.py
│   ├── samplers
│   │   └── ClassAwareSampler.py
│   └── utils
│       ├── Default_config.yaml
│       ├── experiments_maker.py
│       ├── globals.py
│       ├── logger.py
│       └── utils.py
├── LICENSE
├── main.py
├── Notebooks
│   ├── Create_mini-ImageNet-LT.ipynb
│   └── toy_example.ipynb
├── readme_assets
│   ├── method.svg
│   └── toy_example_output.svg
├── README.md
├── run_all_CIFAR100-LT.sh
├── run_all_mini-ImageNet-LT.sh
├── run_TailCalibX_CIFAR100-LT.sh
└── run_TailCalibX_mini-imagenet-LT.sh

Ignored tailcalib_pip as it is for the tailcalib pip package.

📃 Citation

@inproceedings{rahul2021tailcalibX,
    title   = {{Feature Generation for Long-tail Classification}},
    author  = {Rahul Vigneswaran and Marc T. Law and Vineeth N. Balasubramanian and Makarand Tapaswi},
    booktitle = {ICVGIP},
    year = {2021}
}

👁 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

❤ About me

Rahul Vigneswaran

✨ Extras

🐝 Long-tail buzz : If you are interested in deep learning research which involves long-tailed / imbalanced dataset, take a look at Long-tail buzz to learn about the recent trending papers in this field.

📝 License

MIT

Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Related tags

Overview

TailCalibX : Feature Generation for Long-tail Classification

Table of contents

🐣 Easy Usage (Recommended way to use our method)

💻 Installation

👨‍💻 Example Code

🧪 Advanced Usage

✔ Things to do before you run the code from this repo

📀 How to use?

📚 How to create the mini-ImageNet-LT dataset?

⚙ Arguments

🏋️‍♂️ Trained weights

🪀 Results on a Toy Dataset

🌴 Directory Tree

📃 Citation

👁 Contributing

❤ About me

✨ Extras

📝 License

Owner

Rahul Vigneswaran

an implementation of softmax splatting for differentiable forward warping using PyTorch

Simple ONNX operation generator. Simple Operation Generator for ONNX.

Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

A Python package for generating concise, high-quality summaries of a probability distribution

ICCV2021 Expert-Goal Trajectory Prediction

Dynamic View Synthesis from Dynamic Monocular Video

A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot

ilpyt: imitation learning library with modular, baseline implementations in Pytorch

An air quality monitoring service with a Raspberry Pi and a SDS011 sensor.

Repository for the electrical and ICT benchmark model developed in the ERIGrid 2.0 project.

Chess reinforcement learning by AlphaGo Zero methods.

Towards Long-Form Video Understanding

Photo2cartoon - 人像卡通化探索项目 (photo-to-cartoon translation project)

Hypercomplex Neural Networks with PyTorch

Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis"

This repository holds code and data for our PETS'22 article 'From "Onion Not Found" to Guard Discovery'.

NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

Two-stage CenterNet

The mini-MusicNet dataset