Prefix Tuning

Files:

.
├── gpt2                          # Code for GPT2 style autoregressive LM
│   ├── train_e2e.py              # high-level scripts to train.
│   ├── train_control.py          # code that implements prefix-tuning.
│   ├── trainer_prefix.py         # trainer code for the training loop. 
│   ├── run_language_modeling.py  # training code (contains data loading, model loading, and calls trainer)
│   ├── gen.py                    # high-level scripts to decode. 
│   └── run_generation.py         # decoding code. 
│
├── seq2seq                       # Code for encoder-decoder architecture
│   ├── train_bart.py             # high-level scripts to train.
│   ├── prefixTuning.py           # code that implements prefix-tuning.
│   ├── finetune.py               # training code (contains data loading, model loading, and calls trainer)   
│   ├── lightning_base.py         # helper code
│   ├── utils.py                  # helper code
│   └── callbacks.py              # helper code
└── ...

To run the code for GPT2 style autoregressive LM, the code is in gpt2/. This corresponds to the table-to-text experiments in the paper.

To run the code for encoder-decoder architecture like BART, the code is in seq2seq. This corresponds to the summarization experiments in the paper.

The two primary scripts I used to run my codes are gpt2/train_e2e.py (for table-to-text) and seq2seq/train_bart.py(for summarization). they are set to default of good hyperparameters, and can be used to tune hyperparameter :)

Setup:

cd transformer; pip install -e .

Train via prefix-tuning:

cd gpt2;

python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

cd seq2seq; 

python train_bart.py --mode xsum --preseqlen 200 --do_train yes --fp16 yes --bsz 16  --epoch 30  --gradient_accumulation_step 3 --learning_rate 0.00005  --mid_dim 800

Other baseline approaches

cd gpt2;

python train_e2e.py --tuning_mode {finetune/adaptertune} --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

cd seq2seq;

python train_e2e.py --tuning_mode finetune --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101

Decode:

cd gpt2;

python gen.py {data2text/webnlg/...} yes test {checkpoint_path} no

cd seq2seq; 

python train_bart.py --mode xsum --do_train no --prefix_model_path {checkpoint_path} --preseqlen {same as training} --mid_dim {same as training}

For details of the methods and results, please refer to our paper.

@misc{li2021prefixtuning,
      title={Prefix-Tuning: Optimizing Continuous Prompts for Generation}, 
      author={Xiang Lisa Li and Percy Liang},
      year={2021},
      eprint={2101.00190},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Related tags

Overview

Prefix Tuning

Files:

Setup:

Train via prefix-tuning:

Decode:

Owner

PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection.

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)

This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking".

Pytorch implementation of various High Dynamic Range (HDR) Imaging algorithms

Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation

Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

The official implementation of the Hybrid Self-Attention NEAT algorithm

IA for recognising Traffic Signs using Keras [Tensorflow]

Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

A custom-designed Spider Robot trained to walk using Deep RL in a PyBullet Simulation

Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

"Graph Neural Controlled Differential Equations for Traffic Forecasting", AAAI 2022

Self-Regulated Learning for Egocentric Video Activity Anticipation

Pytorch library for fast transformer implementations

Deep functional residue identification

Official PyTorch Implementation of Embedding Transfer with Label Relaxation for Improved Metric Learning, CVPR 2021

A vision library for performing sliced inference on large images/small objects