This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Last update: Mar 24, 2022

Overview

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Requirements

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Download checkpoints

Download the vocabulary file of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the pre-trained checkpoint of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the 2nd general distillation checkpoint of TinyBERT from HERE, and extract them into ./pretrained_ckpt/.

Prepare dataset

Download the GLUE dataset (containing MNLI) using the script in HERE, and put the files into ./dataset/glue/. Download the Amazon Reviews dataset from HERE, and extract it into ./dataset/amazon_review/

Train the teacher model (BERT$_{\rm B}$-single) from single-domain

bash train_domain.sh

Distill the student model (BERT$_{\rm S}$) with TinyBERT-KD from single-domain

bash finetune_domain.sh

Train the teacher model (HRKD-teacher) from multi-domain

bash train_multi_domain.sh

And then put the checkpoints to the specified directories (see the beginning of finetune_multi_domain.py for more details).

Distill the student model (BERT$_{\rm S}$) with our HRKD from multi-domain

bash finetune_multi_domain.sh

Reference

If you find this code helpful for your research, please cite the following paper.

@inproceedings{dong2021hrkd,
  title     = {{HRKD}: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression},
  author    = {Chenhe Dong and Yaliang Li and Ying Shen and Minghui Qiu},
  booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year      = {2021}
}

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Related tags

Overview

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

Requirements

Download checkpoints

Prepare dataset

Train the teacher model (BERT$_{\rm B}$-single) from single-domain

Distill the student model (BERT$_{\rm S}$) with TinyBERT-KD from single-domain

Train the teacher model (HRKD-teacher) from multi-domain

Distill the student model (BERT$_{\rm S}$) with our HRKD from multi-domain

Reference

Owner

Chenhe Dong

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

TensorFlow Implementation of "Show, Attend and Tell"

A list of all papers and resoureces on Semantic Segmentation

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing (CVPR 2018).

Optimizaciones incrementales al problema N-Body con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámbito de HPC.

Automate issue discovery for your projects against Lightning nightly and releases.

Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

Source code for From Stars to Subgraphs

Manifold Alignment for Semantically Aligned Style Transfer

An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

Fast, modular reference implementation and easy training of Semantic Segmentation algorithms in PyTorch.

Official implement of Paper：A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

Pre-trained Deep Learning models and demos (high quality and extremely fast)

QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets

The source code of "SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation", accepted to WACV 2022.