Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Last update: Jan 03, 2023

Overview

Text-AutoAugment (TAA)

This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification (EMNLP 2021 main conference).

Overview

We present a learnable and compositional framework for data augmentation. Our proposed algorithm automatically searches for the optimal compositional policy, which improves the diversity and quality of augmented samples.
In low-resource and class-imbalanced regimes of six benchmark datasets, TAA significantly improves the generalization ability of deep neural networks like BERT and effectively boosts text classification performance.

Getting Started

Prepare environment

conda create -n taa python=3.6
conda activate taa
conda install pytorch torchvision cudatoolkit=10.0 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
pip install -r requirements.txt 
python -c "import nltk; nltk.download('wordnet'); nltk.download('averaged_perceptron_tagger')"

Modify dataroot parameter in confs/*yaml and abspath parameter in script/*.sh:
- e.g., change dataroot: /home/renshuhuai/TextAutoAugment/data/aclImdb in confs/bert_imdb.yaml to dataroot: path-to-your-TextAutoAugment/data/aclImdb
- change --abspath '/home/renshuhuai/TextAutoAugment' in script/imdb_lowresource.sh to --abspath 'path-to-your-TextAutoAugment'
Search for the best augmentation policy, e.g., low-resource regime for IMDB:
```
sh script/imdb_lowresource.sh
```
scripts for policy search in the low-resource and class-imbalanced regime for all datasets are provided in the script/ fold.
Train a model with pre-searched policy in archive.py, e.g., train model in low-resource regime for IMDB:
```
python train.py -c confs/bert_imdb.yaml 
```
train model on full dataset of IMDB:
```
python train.py -c confs/bert_imdb.yaml --train-npc -1 --valid-npc -1 --test-npc -1  
```

Contact

If you have any questions related to the code or the paper, feel free to email Shuhuai (renshuhuai007 [AT] gmail [DOT] com).

Acknowledgments

Code refers to: fast-autoaugment.

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{ren2021taa,
  title={Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification},
  author={Shuhuai Ren, Jinchao Zhang, Lei Li, Xu Sun, Jie Zhou},
  booktitle={EMNLP},
  year={2021}
}

License

MIT

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Related tags

Overview

Text-AutoAugment (TAA)

Overview

Getting Started

Contact

Acknowledgments

Citation

License

Owner

LancoPKU

🛰️ List of earth observation companies and job sites

Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

A PyTorch implementation of "DGC-Net: Dense Geometric Correspondence Network"

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

The code of NeurIPS 2021 paper "Scalable Rule-Based Representation Learning for Interpretable Classification".

A toolkit for making real world machine learning and data analysis applications in C++

Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing

The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Demos of essentia classifiers hosted on replicate.ai

Playable Video Generation

TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers.

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Walk with fastai

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.