PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

Related tags

Deep Learningpika
Overview

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

PIKA is a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi. The first release focuses on end-to-end speech recognition. We use Pytorch as deep learning engine, Kaldi for data formatting and feature extraction.

Key Features

  • On-the-fly data augmentation and feature extraction loader

  • TDNN Transformer encoder and convolution and transformer based decoder model structure

  • RNNT training and batch decoding

  • RNNT decoding with external Ngram FSTs (on-the-fly rescoring, aka, shallow fusion)

  • RNNT Minimum Bayes Risk (MBR) training

  • LAS forward and backward rescorer for RNNT

  • Efficient BMUF (Block model update filtering) based distributed training

Installation and Dependencies

In general, we recommend Anaconda since it comes with most dependencies. Other major dependencies include,

Pytorch

Please go to https://pytorch.org/ for pytorch installation, codes and scripts should be able to run against pytorch 0.4.0 and above. But we recommend 1.0.0 above for compatibility with RNNT loss module (see below)

Pykaldi and Kaldi

We use Kaldi (https://github.com/kaldi-asr/kaldi)) and PyKaldi (a python wrapper for Kaldi) for data processing, feature extraction and FST manipulations. Please go to Pykaldi website https://github.com/pykaldi/pykaldi for installation and make sure to build Pykaldi with ninja for efficiency. After following the installation process of pykaldi, you should have both Kaldi and Pykaldi dependencies ready.

CUDA-Warp RNN-Transducer

For RNNT loss module, we adopt the pytorch binding at https://github.com/1ytic/warp-rnnt

Others

Check requirements.txt for other dependencies.

Get Started

To get started, check all the training and decoding scripts located in egs directory.

I. Data preparation and RNNT training

egs/train_transducer_bmuf_otfaug.sh contains data preparation and RNNT training. One need to prepare training data and specify the training data directory,

#training data dir must contain wav.scp and label.txt files
#wav.scp: standard kaldi wav.scp file, see https://kaldi-asr.org/doc/data_prep.html 
#label.txt: label text file, the format is, uttid sequence-of-integer, where integer
#           is one-based indexing mapped label, note that zero is reserved for blank,  
#           ,eg., utt_id_1 3 5 7 10 23 
train_data_dir=

II. Continue with MBR training

With RNNT trained model, one can continued MBR training with egs/train_transducer_mbr_bmuf_otfaug.sh (assuming using the same training data, therefore data preparation is omitted). Make sure to specify the initial model,

--verbose \
--optim sgd \
--init_model $exp_dir/init.model \
--rnnt_scale 1.0 \
--sm_scale 0.8 \

III. Training LAS forward and backward rescorer

One can train a forward and backward LAS rescorer for your RNN-T model using egs/train_las_rescorer_bmuf_otfaug.sh. The LAS rescorer will share the encoder part with RNNT model, and has extra two-layer LSTM as additional encoder, make sure to specify the encoder sharing as,

--num_batches_per_epoch 526264 \
--shared_encoder_model $exp_dir/final.model \
--num_epochs 5 \

We support bi-directional LAS rescoring, i.e., forward and backward rescoring. Backward (right-to-left) rescoring is achieved by reversing sequential labels when conducting LAS model training. One can easily perform a backward LAS rescorer training by specifying,

--reverse_labels

IV. Decoding

egs/eval_transducer.sh is the main evluation script, which contains the decoding pipeline. Forward and backward LAS rescoring can be enabled by specifying these two models,

##########configs#############
#rnn transducer model
rnnt_model=
#forward and backward las rescorer model
lasrescorer_fw=
lasrescorer_bw=

Caveats

All the training and decoding hyper-parameters are adopted based on large-scale (e.g., 60khrs) training and internal evaluation data. One might need to re-tune hyper-parameters to acheive optimal performances. Also the WER (CER) scoring script is based on a Mandarin task, we recommend those who work on different languages rewrite scoring scripts.

References

[1] Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition, Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu, InterSpeech 2018

[2] Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition, Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu, InterSpeech 2020

Citations

@inproceedings{Weng2020,
  author={Chao Weng and Chengzhu Yu and Jia Cui and Chunlei Zhang and Dong Yu},
  title={{Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={966--970},
  doi={10.21437/Interspeech.2020-1221},
  url={http://dx.doi.org/10.21437/Interspeech.2020-1221}
}

@inproceedings{Weng2018,
  author={Chao Weng and Jia Cui and Guangsen Wang and Jun Wang and Chengzhu Yu and Dan Su and Dong Yu},
  title={Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={761--765},
  doi={10.21437/Interspeech.2018-1030},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1030}
}

Disclaimer

This is not an officially supported Tencent product

Owner
Research repositories.
I3-master-layout - Simple master and stack layout script

Simple master and stack layout script | ------ | ----- | | | | | Ma

Tobias S 18 Dec 05, 2022
NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

100 Sep 28, 2022
Using pretrained GROVER to extract the atomic fingerprints from molecule

Extracting atomic fingerprints from molecules using pretrained Graph Neural Network models (GROVER).

Xuan Vu Nguyen 1 Jan 28, 2022
Implementing DropPath/StochasticDepth in PyTorch

%load_ext memory_profiler Implementing Stochastic Depth/Drop Path In PyTorch DropPath is available on glasses my computer vision library! Introduction

Francesco Saverio Zuppichini 13 Jan 05, 2023
Semi-automated OpenVINO benchmark_app with variable parameters

Semi-automated OpenVINO benchmark_app with variable parameters. User can specify multiple options for any parameters in the benchmark_app and the progam runs the benchmark with all combinations of gi

Yasunori Shimura 8 Apr 11, 2022
High-Resolution Image Synthesis with Latent Diffusion Models

Latent Diffusion Models Requirements A suitable conda environment named ldm can be created and activated with: conda env create -f environment.yaml co

CompVis Heidelberg 5.6k Jan 04, 2023
OneFlow is a performance-centered and open-source deep learning framework.

OneFlow OneFlow is a performance-centered and open-source deep learning framework. Latest News Version 0.5.0 is out! First class support for eager exe

OneFlow 4.2k Jan 07, 2023
Code for "Multi-Compound Transformer for Accurate Biomedical Image Segmentation"

News The code of MCTrans has been released. if you are interested in contributing to the standardization of the medical image analysis community, plea

97 Jan 05, 2023
Keras-retinanet - Keras implementation of RetinaNet object detection.

Keras RetinaNet Keras implementation of RetinaNet object detection as described in Focal Loss for Dense Object Detection by Tsung-Yi Lin, Priya Goyal,

Fizyr 4.3k Jan 01, 2023
Run PowerShell command without invoking powershell.exe

PowerLessShell PowerLessShell rely on MSBuild.exe to remotely execute PowerShell scripts and commands without spawning powershell.exe. You can also ex

Mr.Un1k0d3r 1.2k Jan 03, 2023
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Dec 31, 2022
Human4D Dataset tools for processing and visualization

HUMAN4D: A Human-Centric Multimodal Dataset for Motions & Immersive Media HUMAN4D constitutes a large and multimodal 4D dataset that contains a variet

tofis 15 Nov 09, 2022
Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning This is the code for implementing the MADDPG algorithm presented in

97 Dec 21, 2022
iBOT: Image BERT Pre-Training with Online Tokenizer

Image BERT Pre-Training with iBOT Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

Bytedance Inc. 435 Jan 06, 2023
Tf alloc - Simplication of GPU allocation for Tensorflow2

tf_alloc Simpliying GPU allocation for Tensorflow Developer: korkite (Junseo Ko)

Junseo Ko 3 Feb 10, 2022
Continuous Time LiDAR odometry

CT-ICP: Elastic SLAM for LiDAR sensors This repository implements the SLAM CT-ICP (see our article), a lightweight, precise and versatile pure LiDAR o

385 Dec 29, 2022
The code uses SegFormer for Semantic Segmentation on Drone Dataset.

SegFormer_Segmentation The code uses SegFormer for Semantic Segmentation on Drone Dataset. The details for the SegFormer can be obtained from the foll

Dr. Sander Ali Khowaja 1 May 08, 2022
Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

SARS-CoV-2 processing requests Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide. Prerequisites This autom

useGalaxy.eu 17 Aug 13, 2022
Official repository for Hierarchical Opacity Propagation for Image Matting

HOP-Matting Official repository for Hierarchical Opacity Propagation for Image Matting 🚧 🚧 🚧 Under Construction 🚧 🚧 🚧 🚧 🚧 🚧   Coming Soon   

Li Yaoyi 54 Dec 30, 2021
Tool for live presentations using manim

manim-presentation Tool for live presentations using manim Install pip install manim-presentation opencv-python Usage Use the class Slide as your sce

Federico Galatolo 146 Jan 06, 2023