This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Last update: Nov 15, 2022

Related tags

Overview

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

It includes /bert, which is the original BERT repository modified to be weight prunable. (And to use gradient checkpointing, if you need that. This can be disabled by setting a unix environment variable DISABLE_GRAD_CHECKPOINT=True. This only works during fine-tuning, not during pre-training.)

I am currently in the process of converting these experiments into a ducttape workflow, so things are a little unstable right now.

Things that have not been converted to ducttape:

Anything in tables/
Anything in graphs/

If you need all the experiments from the paper, check out this commit. It's very messy, so be prepared to read the code. I will not be releasing a guide to run that code, since it will be made obselete by the ducttape workflow.

Configuration

pip install -r requirements.txt

To pre-train, you will need a GPU with at least 12 GB of GPU RAM. I've been using Titan RTX's via Univa Grid Engine. If you don't like this setup, you will need to modify tapes/submitters.tape and/or main.tconf.

You'll also need the Wikipedia corpus and BookCorpus, which can be retrieved with scripts/download_wiki.sh or scripts/download_bookcorpus.sh, respectively. GLUE data can be retrieved by running scripts/get_glue.py.

You will need to update tapes/link_data.tape to point to dataset locations.

You will also need to update main.tconf to point to the location of your repository on disk (so ducttape knows where to find packages).

AFAIK, no one besides me has used this code. If you have trouble, please open an issue and I'll do what I can to help out.

Most experiments are run using

ducttape main.tape -C main.tconf -p main

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Related tags

Overview

Configuration

Owner

Mitchell Gordon

Minecraft agent to farm resources using reinforcement learning

Directed Greybox Fuzzing with AFL

Interactive Image Segmentation via Backpropagating Refinement Scheme

Code-free deep segmentation for computational pathology

Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

Udacity Suse Cloud Native Foundations Scholarship Course Walkthrough

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth, in ICCV 2021 (oral)

PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Easy to use Audio Tagging in PyTorch

[TNNLS 2021] The official code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement"

Character Controllers using Motion VAEs

f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

PyTorch META-DATASET (Few-shot classification benchmark)

Probabilistic Gradient Boosting Machines

Advancing mathematics by guiding human intuition with AI

Official repository for the paper "Instance-Conditioned GAN"

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

A PyTorch Toolbox for Face Recognition