Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Last update: Oct 14, 2022

Overview

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

This is an implementation of the paper, along with the pipeline and pretrained model using an open dataset. Audio samples of the paper is available here.

Recipe

This open pipeline uses the Databaker dataset. Please refer to our previous pipeline for dataset preprocessing, while only the Databaker dataset is used. Besides, you need to run lexicon/build_databaker.py to build the vocabulary, download the lexicon from zdic.net, and encode them with XLM-R. Feel free to change the target directory to save the data, which is specified in build_databaker.py and lexicon_utils.py.

Below are the commands to train and evaluate. Default target directories specified in the preprocessing scripts are used, so please substitute them with your own. The evaluation script can be run simultaneously with the training script. You may also use the evaluation script to synthesize samples from pretrained models. Please refer to the help of the arguments for their meanings.

python -m torch.distributed.launch --nproc_per_node=NGPU --model-dir=MODEL_DIR --log-dir=LOG_DIR --data-dir=D:\free_corpus\packed\ --training_languages=zh-cn --eval_languages=zh-cn --training_speakers=databaker --eval_steps=100000:150000 --hparams="input_method=char,multi_speaker=True,use_knowledge_attention=True,remove_space=True,data_format=nlti" --external_embed=D:\free_corpus\packed\embed.zip --vocab=D:\free_corpus\packed\db_vocab.json

python eval.py --model-dir=MODEL_DIR --log-dir=LOG_DIR --data-dir=D:\free_corpus\packed\ --eval_languages=zh-cn --eval_meta=D:\free_corpus\packed\metadata.eval.txt --hparams="input_method=char,multi_speaker=True,use_knowledge_attention=True,remove_space=True,data_format=nlti" --start_step=100000 --vocab=D:\free_corpus\packed\db_vocab.json --external_embed=D:\free_corpus\packed\embed.zip --eval_speakers=databaker

Besides, to report CER, you need to create azure_key.json with your own Azure STT subscription, with content of {"subscription": "YOUR_KEY", "region": "YOUR_REGION"}, see utils/transcribe.py. Due to significant differences of the datasets used, the implementation is for demonstration only and could not fully reproduce the results in the paper.

Pretrained Model

The pretrained models on Databaker are available at OneDrive Link, which reaches a CER of 4.19%. Relevant files necessary for generation of speeches including lexicon texts, lexicon embeddings, the vocabulary file, and evaluation scripts are also included to aid fast reproduction.

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Related tags

Overview

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Recipe

Pretrained Model

Owner

Mutian He

Styleformer - Official Pytorch Implementation

Hashformers is a framework for hashtag segmentation with transformers.

Unofficial implementation of Proxy Anchor Loss for Deep Metric Learning

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

Eff video representation - Efficient video representation through neural fields

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

Awesome Transformers in Medical Imaging

PyTorch implementation of spectral graph ConvNets, NIPS’16

[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

This repository is for DSA and CP scripts for reference.

PyTorchVideo is a deeplearning library with a focus on video understanding work

Very large and sparse networks appear often in the wild and present unique algorithmic opportunities and challenges for the practitioner

PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Semantic Segmentation for Aerial Imagery using Convolutional Neural Network

Really awesome semantic segmentation

Unity Propagation in Bayesian Networks Handling Inconsistency via Unity Smoothing

这是一个yolo3-tf2的源码，可以用于训练自己的模型。