Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

Last update: Sep 28, 2022

Overview

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo

Block diagram of FCL-taco2, where the decoder generates mel-spectrograms in AR mode within each phoneme and is shared for all phonemes.

💬 Huawei Noah's Ark Lab is recruiting interns on speech processing fields, if you're interested, you're welcome to contact Dr. Deng: [email protected]

Training and inference scripts for FCL-taco2

Environment

python 3.6.10
torch 1.3.1
chainer 6.0.0
espnet 8.0.0
apex 0.1
numpy 1.19.1
kaldiio 2.15.1
librosa 0.8.0

Training and inference:

Step1. Data preparation & preprocessing

Download LJSpeech
Unpack downloaded LJSpeech-1.1.tar.bz2 to /xx/LJSpeech-1.1
Obtain the forced alignment information by using Montreal forced aligner tool. Or you can download our alignment results, then unpack it to /xx/TextGrid
Preprocess the dataset to extract mel-spectrograms, phoneme duration, pitch, energy and phoneme sequence by:
```
 python preprocessing.py --data-root /xx/LJSpeech-1.1 --textgrid-root /xx/TextGrid
```

Step2. Model training

Training teacher model FCL-taco2-T:
```
 ./teacher_model_training.sh
```
Training student model FCL-taco2-S:
```
 ./student_model_training.sh
```
Parallel-WaveGAN vocoder training: follow instructions at here. You can also download the pre-trained PWG vocoder, and put the PWG model under the directory "vocoder".

Step3. Model evaluation

FCL-taco2-T evaluation:
```
 ./inference_teacher.sh
```
FCL-taco2-S evaluation:
```
 ./inference_student.sh
```

Citation

If the code is used in your research, please star our repo and cite our paper:

@inproceedings{wang2021fcl,
  title={Fcl-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech Synthesis},
  author={Wang, Disong and Deng, Liqun and Zhang, Yang and Zheng, Nianzu and Yeung, Yu Ting and Chen, Xiao and Liu, Xunying and Meng, Helen},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5714--5718},
  year={2021},
  organization={IEEE}
}

Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

Related tags

Overview

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo

💬 Huawei Noah's Ark Lab is recruiting interns on speech processing fields, if you're interested, you're welcome to contact Dr. Deng: [email protected]

Training and inference scripts for FCL-taco2

Environment

Training and inference:

Citation

Owner

Disong Wang

A graph adversarial learning toolbox based on PyTorch and DGL.

Language models are open knowledge graphs ( non official implementation )

Python module providing a framework to trace individual edges in an image using Gaussian process regression.

Time series annotation library.

Learning to See by Looking at Noise

SplineConv implementation for Paddle.

Official Repository for Machine Learning class - Physics Without Frontiers 2021

QKeras: a quantization deep learning library for Tensorflow Keras

Seq2seq - Sequence to Sequence Learning with Keras

Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification"

Code and data (Incidents Dataset) for ECCV 2020 Paper "Detecting natural disasters, damage, and incidents in the wild".

An University Project of Quera Web Crawling.

CausaLM: Causal Model Explanation Through Counterfactual Language Models

Extension to fastai for volumetric medical data

Object detection using yolo-tiny model and opencv used as backend

KITTI-360 Annotation Tool is a framework that developed based on python(cherrypy + jinja2 + sqlite3) as the server end and javascript + WebGL as the front end.

Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

Collection of sports betting AI tools.

Code for "Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods."

Image process framework based on plugin like imagej, it is esay to glue with scipy.ndimage, scikit-image, opencv, simpleitk, mayavi...and any libraries based on numpy