The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Last update: Dec 07, 2022

Related tags

Deep Learning YOCO-BERT

Overview

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper)

@misc{zhang2021compress,
      title={You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient}, 
      author={Shaokun Zhang and Xiawu Zheng and Chenyi Yang and Yuchao Li and Yan Wang and Fei Chao and Mengdi Wang and Shen Li and Jun Yang and Rongrong Ji},
      year={2021},
      eprint={2106.02435},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
      }

Overview

This repository is the official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

📋 We propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere. Compared with state of-the-art algorithms, YOCO-BERT provides more compact models, yet achieving superior average accuracy improvement on the GLUE.

Requirements

Python > 3.6
Pytorch = 1.7.0
transformers = 3.5.0

Training

To train the super-BERTs in the paper, run this command:

python train_superbert.py --cfg /path_to_superbert_training_config/config.yaml

Searching

To search the optimal sub-BERTs given any constraints in the paper, run this command:

python search_subbert.py --cfg /path_to_subbert_searching_config/config.yaml

Evaluation

The evaluation results will be reported after the searching process.

Config

We release all the traning and searching configs in config

Results

Our model achieves the following performance on :

GLUE

Results given various FlOPs and parameters.

Results under common constraints (compress to no more than 66M)

Datasets	SST-2	MRPC	CoLA	RTE	MNLI	QQP	QNLI
Results	92.8	90.3	59.8	72.9	82.6	90.5	87.2

📋 The detailed metrics used in this code are reported in the paper.

Licence

This repository is released under the MIT license. See LICENSE for more information.

Contact

Any problem regarding this code re-implementation, feel free to contact the first author: [email protected]

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Related tags

Overview

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper)

Overview

Requirements

Training

Searching

Evaluation

Config

Results

GLUE

Results given various FlOPs and parameters.

Results under common constraints (compress to no more than 66M)

Licence

Contact

Owner

scikit-learn: machine learning in Python

Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

PyTorch implementation of SIFT descriptor

Streamlit App For Product Analysis - Streamlit App For Product Analysis

A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

Detect roadway lanes using Python OpenCV for project during the 5th semester at DHBW Stuttgart for lecture in digital image processing.

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers"

YOLOX Win10 Project

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining

A Deep Learning Framework for Neural Derivative Hedging

Which Style Makes Me Attractive? Interpretable Control Discovery and Counterfactual Explanation on StyleGAN

Text to Image Generation with Semantic-Spatial Aware GAN

iNAS: Integral NAS for Device-Aware Salient Object Detection

Anchor-free Oriented Proposal Generator for Object Detection

Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

50-days-of-Statistics-for-Data-Science - This repository consist of a 50-day program

This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models"

This is a file about Unet implemented in Pytorch

Code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge.