DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Last update: Dec 27, 2021

Related tags

Deep Learning machine-learning

Overview

DSEE

Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Chen, Zhangyang Wang, Ahmed Hassan Awadallahp

Overview

TBD

Requirements

We use conda to create virtual environments.

conda create -f environment.yml
conda activate dsee

Command

Unstructured DSEE

Step 0.

cd non-GPT-2
pip install -e .
cd ..

Step 1. Pre-training

Take SST-2 as example:

OUTPUT_DIR='./sst2_rank16_s1_64'
num_gpus=4
python -m torch.distributed.launch \
    --nproc_per_node=$num_gpus \
    --master_port=12345 non-GPT-2/examples/pytorch/text-classification/run_glue.py \
    --save_total_limit 10 \
    --model_name_or_path bert-base-uncased \ 
    --task_name sst2 \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --num_train_epochs 3 \
    --save_steps 50 \
    --seed 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_seq_length 128 \
    --overwrite_output_dir \
    --logging_steps 50 \
    --load_best_model_at_end True \
    --metric_for_best_model eval_accuracy \
    --apply_lora \
    --lora_r 16 \
    --apply_sparse \
    --num_sparse 64  \
    --learning_rate 2e-4 \
    --evaluation_strategy steps

Step 2. Pruning & Fine-tuning

OUTPUT_DIR='./sst2_rank16_s1_64_prune_0.5'
num_gpus=4
python -m torch.distributed.launch \
    --nproc_per_node=$num_gpus \
    --master_port=12335 \
    non-GPT-2/examples/pytorch/text-classification/run_glue_prune_tune.py \
    --save_total_limit 10 \
    --model_name_or_path sst2_rank16_s1_64 \
    --task_name sst2 \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --num_train_epochs 3 \
    --save_steps 50 \
    --seed 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_seq_length 128 \
    --overwrite_output_dir \
    --logging_steps 50 \
    --load_best_model_at_end True \
    --metric_for_best_model eval_accuracy \
    --apply_lora \
    --lora_r 16 \
    --apply_sparse \
    --num_sparse 64 \
    --learning_rate 2e-4 \
    --pruning_ratio 0.5 \
    --evaluation_strategy steps

TODO

Codes for Unstructured DSEE on GPT-2
Codes for Structured DSEE

Acknowledgement

The Huggingface's Transformers (https://github.com/huggingface/transformers)

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Related tags

Overview

DSEE

Overview

Requirements

Command

Unstructured DSEE

Step 0.

Step 1. Pre-training

Step 2. Pruning & Fine-tuning

TODO

Acknowledgement

Owner

VITA

QT Py Media Knob using rotary encoder & neopixel ring

A PyTorch Implementation of the Luna: Linear Unified Nested Attention

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

the official implementation of the paper "Isometric Multi-Shape Matching" (CVPR 2021)

A implemetation of the LRCN in mxnet

Torchlight2 lan game server tool - A message forwarding tool for Torchlight 2 lan game

External Attention Network

Dark Finix: All in one hacking framework with almost 100 tools

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Kaggle Ultrasound Nerve Segmentation competition [Keras]

Pytorch implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

Wide Residual Networks (WideResNets) in PyTorch

Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

Computer Vision Paper Reviews with Key Summary of paper, End to End Code Practice and Jupyter Notebook converted papers

The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

Next-gen Rowhammer fuzzer that uses non-uniform, frequency-based patterns.

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator