LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Last update: Dec 29, 2022

Overview

LightHuBERT

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

| Github | Huggingface | SUPERB Leaderboard |

The authors' PyTorch implementation and pretrained models of LightHuBERT.

March 2022: release preprint in arXiv and checkpoints in huggingface.

Pre-Trained Models

Model	Pre-Training Dataset	Download Link
LightHuBERT Base	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_base.pt
LightHuBERT Small	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_small.pt
LightHuBERT Stage 1	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_stage1.pt

Actually, the pre-trained is trained in common.fp16: true so that we can perform model inference with fp16 weights.

Requirements and Installation

PyTorch version >= 1.8.1
Python version >= 3.6
numpy version >= 1.19.3
To install lighthubert:

git clone [email protected]:mechanicalsea/lighthubert.git
cd lighthubert
pip install --editable .

Load Pre-Trained Models for Inference

import torch
from lighthubert import LightHuBERT, LightHuBERTConfig

wav_input_16khz = torch.randn(1,10000).cuda()

# load the pre-trained checkpoints
checkpoint = torch.load('/path/to/lighthubert.pt')
cfg = LightHuBERTConfig(checkpoint['cfg']['model'])
cfg.supernet_type = 'base'
model = LightHuBERT(cfg)
model = model.cuda()
model = model.eval()
print(model.load_state_dict(checkpoint['model'], strict=False))

# (optional) set a subnet
subnet = model.supernet.sample_subnet()
model.set_sample_config(subnet)
params = model.calc_sampled_param_num()
print(f"subnet (Params {params / 1e6:.0f}M) | {subnet}")

# extract the the representation of last layer
rep = model.extract_features(wav_input_16khz)[0]

# extract the the representation of each layer
hs = model.extract_features(wav_input_16khz, ret_hs=True)[0]

print(f"Representation at bottom hidden states: {torch.allclose(rep, hs[-1])}")

More examples can be found in our tutorials.

Universal Representation Evaluation on SUPERB

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Reference

If you find our work is useful in your research, please cite the following paper:

@article{wang2022lighthubert,
  title={{LightHuBERT}: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit {BERT}},
  author={Rui Wang and Qibing Bai and Junyi Ao and Long Zhou and Zhixiang Xiong and Zhihua Wei and Yu Zhang and Tom Ko and Haizhou Li},
  journal={arXiv preprint arXiv:2203.15610},
  year={2022}
}

Contact Information

For help or issues using LightHuBERT models, please submit a GitHub issue.

For other communications related to LightHuBERT, please contact Rui Wang ([email protected]).

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Related tags

Overview

LightHuBERT

Pre-Trained Models

Requirements and Installation

Load Pre-Trained Models for Inference

Universal Representation Evaluation on SUPERB

License

Reference

Contact Information

Owner

WangRui

A PyTorch-based Semi-Supervised Learning (SSL) Codebase for Pixel-wise (Pixel) Vision Tasks

Marine debris detection with commercial satellite imagery and deep learning.

FedGS: A Federated Group Synchronization Framework Implemented by LEAF-MX.

Naszilla is a Python library for neural architecture search (NAS)

Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

a reimplementation of LiteFlowNet in PyTorch that matches the official Caffe version

Repo for the Tutorials of Day1-Day3 of the Nordic Probabilistic AI School 2021 (https://probabilistic.ai/)

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

An atmospheric growth and evolution model based on the EVo degassing model and FastChem 2.0

Weakly Supervised Text-to-SQL Parsing through Question Decomposition

Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

Code for the Paper "Diffusion Models for Handwriting Generation"

Official PyTorch implementation of UACANet: Uncertainty Aware Context Attention for Polyp Segmentation

Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

OneFlow is a performance-centered and open-source deep learning framework.

Lua-parser-lark - An out-of-box Lua parser written in Lark

A simple, unofficial implementation of MAE using pytorch-lightning

Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.