Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Overview

Structured Super Lottery Tickets in BERT

This repo contains our codes for the paper "Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization" (ACL 2021).


Getting Start

  1. python3.6
    Reference to download and install : https://www.python.org/downloads/release/python-360/
  2. install requirements
    > pip install -r requirements.txt

Data

  1. Download data
    sh download.sh
    Please refer to download GLUE dataset: https://gluebenchmark.com/
  2. Preprocess data
    > sh experiments/glue/prepro.sh
    For more data processing details, please refer to this repo.

Verifying Phase Transition Phenomenon

  1. Fine-tune a pre-trained BERT model with single task data, compute importance scores, and generate one-shot structured pruning masks at multiple sparsity levels. E.g., for MNLI, run

    ./scripts/train_mnli.sh GPUID
    
  2. Rewind and evaluate the winning, random, and losing tickets at multiple sparsity levels. E.g., for MNLI, run

    ./scripts/rewind_mnli.sh GPUID
    

You may try tasks with smaller sizes (e.g., SST, MRPC, RTE) to see a more pronounced phase transition.


Multi-task Learning (MTL) with Tickets Sharing

  1. Identify a set of super tickets for each individual task.

    • Identify winning tickets at multiple sparsity levels for each individual task. E.g., for MTDNN-base, run

      ./scripts/prepare_mtdnn_base.sh GPUID
      

      We recommend to use the same optimization settings, e.g., learning rate, optimizer and random seed, in both the ticket identification procedures and the MTL. We empirically observe that the super tickets perform better in MTL in such a case.

    • [Optional] For each individual task, identify a set of super tickets from the winning tickets at multiple sparsity levels. You can skip this step if you wish to directly use the set of super tickets identified by us. If you wish to identify super tickets on your own (This is recommended if you use a different optimization settings, e.g., learning rate, optimizer and random seed, from those in our scripts. These factors may affect the candidacy of super tickets.), we provide the template scripts

      ./scripts/rewind_mnli_winning.sh GPUID
      ./scripts/rewind_qnli_winning.sh GPUID
      ./scripts/rewind_qqp_winning.sh GPUID
      ./scripts/rewind_sst_winning.sh GPUID
      ./scripts/rewind_mrpc_winning.sh GPUID
      ./scripts/rewind_cola_winning.sh GPUID
      ./scripts/rewind_stsb_winning.sh GPUID
      ./scripts/rewind_rte_winning.sh GPUID
      

      These scripts rewind the winning tickets at multiple sparsity levels. You can manually identify the set of super tickets as the set of winning tickets that perform the best among all sparsity levels.

  2. Construct multi-task super tickets by aggregating the identified sets of super tickets of all tasks. E.g., to use the super tickets identified by us, run

    python construct_mtl_mask.py
    

    You can modify the script to use the super tickets identified by yourself.

  3. MTL with tickets sharing. Run

    ./scripts/train_mtdnn.sh GPUID
    

MTL Benchmark

MTL evaluation results on GLUE dev set averaged over 5 random seeds.

Model MNLI-m/mm (Acc) QNLI (Acc) QQP (Acc/F1) SST-2 (Acc) MRPC (Acc/F1) CoLA (Mcc) STS-B (P/S) RTE (Acc) Avg Score Avg Compression
MTDNN, base 84.6/84.2 90.5 90.6/87.4 92.2 80.6/86.2 54.0 86.2/86.4 79.0 82.4 100%
Tickets-Share, base 84.5/84.1 91.0 90.7/87.5 92.7 87.0/90.5 52.0 87.7/87.5 81.2 83.3 92.9%
MTDNN, large 86.5/86.0 92.2 91.2/88.1 93.5 85.2/89.4 56.2 87.2/86.9 83.0 84.4 100%
Tickets-Share, large 86.7/86.0 92.1 91.3/88.4 93.2 88.4/91.5 61.8 89.2/89.1 80.5 85.4 83.3%

Citation

@article{liang2021super,
  title={Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization},
  author={Liang, Chen and Zuo, Simiao and Chen, Minshuo and Jiang, Haoming and Liu, Xiaodong and He, Pengcheng and Zhao, Tuo and Chen, Weizhu},
  journal={arXiv preprint arXiv:2105.12002},
  year={2021}
}

@article{liu2020mtmtdnn,
  title={The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding},
  author={Liu, Xiaodong and Wang, Yu and Ji, Jianshu and Cheng, Hao and Zhu, Xueyun and Awa, Emmanuel and He, Pengcheng and Chen, Weizhu and Poon, Hoifung and Cao, Guihong and Jianfeng Gao},
  journal={arXiv preprint arXiv:2002.07972},
  year={2020}
}

Contact Information

For help or issues related to this package, please submit a GitHub issue. For personal questions related to this paper, please contact Chen Liang ([email protected]).

Owner
Chen Liang
Chen Liang
BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Table of contents Introduction Using BARTpho with fairseq Using BARTpho with transformers Notes BARTpho: Pre-trained Sequence-to-Sequence Models for V

VinAI Research 58 Dec 23, 2022
This is the offline-training-pipeline for our project.

offline-training-pipeline This is the offline-training-pipeline for our project. We adopt the offline training and online prediction Machine Learning

0 Apr 22, 2022
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

VILLA: Vision-and-Language Adversarial Training This is the official repository of VILLA (NeurIPS 2020 Spotlight). This repository currently supports

Zhe Gan 109 Dec 31, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

CodeFill This repository contains the code for our paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Namin

Software Analytics Lab 11 Oct 31, 2022
A python gui program to generate reddit text to speech videos from the id of any post.

Reddit text to speech generator A python gui program to generate reddit text to speech videos from the id of any post. Current functionality Generate

Aadvik 17 Dec 19, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
Simple and efficient RevNet-Library with DeepSpeed support

RevLib Simple and efficient RevNet-Library with DeepSpeed support Features Half the constant memory usage and faster than RevNet libraries Less memory

Lucas Nestler 112 Dec 05, 2022
Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

🤗 Transformers Wav2Vec2 + Parlance's CTCDecode Introduction This repo shows how 🤗 Transformers can be used in combination with Parlance's ctcdecode

Patrick von Platen 9 Jul 21, 2022
A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

EleutherAI 96 Dec 21, 2022
Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning This repo is for Findings at EMNLP 2021 paper: Learn Cont

INK Lab @ USC 6 Sep 02, 2022
LeBenchmark: a reproducible framework for assessing SSL from speech

LeBenchmark: a reproducible framework for assessing SSL from speech

11 Nov 30, 2022
State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Trapper (Transformers wRAPPER) Trapper is an NLP library that aims to make it easier to train transformer based models on downstream tasks. It wraps h

Open Business Software Solutions 42 Sep 21, 2022
ChatterBot is a machine learning, conversational dialog engine for creating chat bots

ChatterBot ChatterBot is a machine-learning based conversational dialog engine build in Python which makes it possible to generate responses based on

Gunther Cox 12.8k Jan 03, 2023
LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating (Dataset) The dataset is from Amazon Review Data (2018)

Immanuvel Prathap S 1 Jan 16, 2022
Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Dual Path Learning for Domain Adaptation of Semantic Segmentation Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Sema

27 Dec 22, 2022
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 69 Jan 03, 2023
Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Finding Label and Model Errors in Perception Data With Learned Observation Assertions This is the project page for Finding Label and Model Errors in P

Stanford Future Data Systems 17 Oct 14, 2022