Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Last update: Dec 27, 2022

Overview

DART

Implementation for ICLR2022 paper Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners.

Environment

[email protected]
Use pip install -r requirements.txt to install dependencies.
wandb account is required if the user wants to search for best hyper-parameter combinations.

Data source

16-shot GLUE dataset from LM-BFF.
Generated data consists of 5 random splits (13/21/42/87/100) for a task, each has 16 samples.

How to run

To run across each 5 splits in a task, use run.py:
- In the arguments, encoder="inner" is the method proposed in the paper where verbalizers are other trainable tokens; encoder="manual" means verbalizers are selected fixed tokens; encoder="lstm" refers to the P-Tuning method.

$ python run.py -h
usage: run.py [-h] [--encoder {manual,lstm,inner,inner2}] [--task TASK]
              [--num_splits NUM_SPLITS] [--repeat REPEAT] [--load_manual]
              [--extra_mask_rate EXTRA_MASK_RATE]
              [--output_dir_suffix OUTPUT_DIR_SUFFIX]

optional arguments:
  -h, --help            show this help message and exit
  --encoder {manual,lstm,inner,inner2}
  --task TASK
  --num_splits NUM_SPLITS
  --repeat REPEAT
  --load_manual
  --extra_mask_rate EXTRA_MASK_RATE
  --output_dir_suffix OUTPUT_DIR_SUFFIX, -o OUTPUT_DIR_SUFFIX

To train and evaluate on a single split with details recorded, use inference.py.
- Before running, [task_name, label_list, prompt_type] should be configured in the code.
- prompt_type="none" refers to fixed verbalizer training, while "inner" refers to the method proposed in the paper. ("inner2" is deprecated 2-stage training)
To find optimal hyper-parameters for each task-split and reproduce our result, please use sweep.py:
- Please refer to documentation for WandB for more details.

$ python sweep.py -h
usage: sweep.py [-h]
                [--task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}]
                [--encoder {none,mlp,lstm,inner,inner2}]
                [--seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]]
                [--batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]]
                [--sweep_id SWEEP_ID]

optional arguments:
  -h, --help            show this help message and exit
  --task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}
  --encoder {none,mlp,lstm,inner,inner2}
  --seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]
  --batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]
  --sweep_id SWEEP_ID

To train and evaluate with more customized configurations, use cli.py.
To analyze and visualize the results come from inference.py, use visualize.py and visualize_word_emb.py.

How to Cite

@article{DBLP:journals/corr/abs-2108-13161,
  author    = {Ningyu Zhang and
               Luoqiu Li and
               Xiang Chen and
               Shumin Deng and
               Zhen Bi and
               Chuanqi Tan and
               Fei Huang and
               Huajun Chen},
  title     = {Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
               Learners},
  journal   = {CoRR},
  volume    = {abs/2108.13161},
  year      = {2021},
  url       = {https://arxiv.org/abs/2108.13161},
  eprinttype = {arXiv},
  eprint    = {2108.13161},
  timestamp = {Thu, 13 Jan 2022 17:33:17 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2108-13161.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Related tags

Overview

DART

Environment

Data source

How to run

How to Cite

Owner

ZJUNLP

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

🕵 Artificial Intelligence for social control of public administration

All-in-one Docker container that allows a user to explore Nautobot in a lab environment.

[ACMMM 2021, Oral] Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception"

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Repo for FUZE project. I will also publish some Linux kernel LPE exploits for various real world kernel vulnerabilities here. the samples are uploaded for education purposes for red and blue teams.

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Code for Understanding Pooling in Graph Neural Networks

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

PyTorch code accompanying the paper "Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning" (NeurIPS 2021).

Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

A curated list of Generative Deep Art projects, tools, artworks, and models

The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding"

On-device speech-to-intent engine powered by deep learning

Tensorflow Implementation of ECCV'18 paper: Multimodal Human Motion Synthesis

Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”