FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis

Overview

LST-TTS

Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis. Submitted to ICASSP 2022. Audio samples/demo for our system can be accessed here

Setting up submodules

git submodule update --init --recursive

Get the waveglow vocoder checkpoint from here (This is from the NVIDIA official WaveGlow repo).

Setup environment

See docker/Dockerfile for the packages need to be installed.

Dataset preprocessing

LJSpeech

python preprocess_LJSpeech.py --datadir LJSpeechDir --outputdir OutputDir

VCTK

Get the leading and trailing scilence marks from this repo, and put vctk-silences.0.92.txt in your VCTK dataset directory.

python preprocess_VCTK.py --datadir VCTKDir --outputdir Output_Train_Dir
python preprocess_VCTK.py --datadir VCTKDir --outputdir Output_Test_Dir --make_test_set
  • --make_test_set: specify this flag to process the speakers in the test set, otherwise only process training speakers.

Training

LJSpeech

python train_TTS.py --precision 16 \
                    --datadir FeatureDir \
                    --vocoder_ckpt_path WaveGlowCKPT_PATH \
                    --sampledir SampleDir \
                    --batch_size 128 \
                    --check_val_every_n_epoch 50 \
                    --use_guided_attn \
                    --training_step 250000 \
                    --n_guided_steps 250000 \
                    --saving_path Output_CKPT_DIR \
                    --datatype LJSpeech \
                    [--distributed]
  • --distributed: enable DDP multi-GPU training
  • --batch_size: batch size per GPU, scale down if you train with multi-GPU and want to keep the same batch size
  • --check_val_every_n_epoch: sample and validate every n epoch
  • --datadir: output directory of the preprocess scripts

VCTK

python train_TTS.py --precision 16 \
                    --datadir FeatureDir \
                    --vocoder_ckpt_path WaveGlowCKPT_PATH \
                    --sampledir SampleDir \
                    --batch_size 64 \
                    --check_val_every_n_epoch 50 \
                    --use_guided_attn \
                    --training_step 150000 \
                    --n_guided_steps 150000 \
                    --etts_checkpoint LJSpeech_Model_CKPT \
                    --saving_path Output_CKPT_DIR \
                    --datatype VCTK \
                    [--distributed]
  • --etts_checkpoint: the checkpoint path of pretrained model (on LJ Speech)

Synthesis

We provide examples for synthesis of the system in synthesis.py, you can adjust this script to your own usage. Example to run synthesis.py:

python synthesis.py --etts_checkpoint VCTK_Model_CKPT \
                    --sampledir SampleDir \
                    --datatype VCTK \
                    --vocoder_ckpt_path WaveGlowCKPT_PATH
Owner
Li-Wei Chen
Li-Wei Chen
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition How Fast Compare to Other Zero-Shot NAS Proxies on CIFAR-10/100 Pre-trained Model

190 Dec 29, 2022
Official Implementation of "Transformers Can Do Bayesian Inference"

Official Code for the Paper "Transformers Can Do Bayesian Inference" We train Transformers to do Bayesian Prediction on novel datasets for a large var

AutoML-Freiburg-Hannover 103 Dec 25, 2022
Measures input lag without dedicated hardware, performing motion detection on recorded or live video

What is InputLagTimer? This tool can measure input lag by analyzing a video where both the game controller and the game screen can be seen on a webcam

Bruno Gonzalez 4 Aug 18, 2022
Predicting Student Attentiveness using OpenCV

Predicting-Student-Attentiveness-using-OpenCV The model will predict if a student is attentive or not through facial parameter received through the st

Johann Pinto 2 Aug 20, 2022
PyTorch implementation of "Optimization Planning for 3D ConvNets"

Optimization-Planning-for-3D-ConvNets Code for the ICML 2021 paper: Optimization Planning for 3D ConvNets. Authors: Zhaofan Qiu, Ting Yao, Chong-Wah N

Zhaofan Qiu 2 Jan 12, 2022
MCMC samplers for Bayesian estimation in Python, including Metropolis-Hastings, NUTS, and Slice

Sampyl May 29, 2018: version 0.3 Sampyl is a package for sampling from probability distributions using MCMC methods. Similar to PyMC3 using theano to

Mat Leonard 304 Dec 25, 2022
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

55 Nov 09, 2022
Mixed Neural Likelihood Estimation for models of decision-making

Mixed neural likelihood estimation for models of decision-making Mixed neural likelihood estimation (MNLE) enables Bayesian parameter inference for mo

mackelab 9 Dec 22, 2022
ONNX Command-Line Toolbox

ONNX Command Line Toolbox Aims to improve your experience of investigating ONNX models. Use it like onnx infershape /path/to/model.onnx. (See the usag

黎明灰烬 (王振华 Zhenhua WANG) 23 Nov 13, 2022
Code for the upcoming CVPR 2021 paper

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel J. Brostow and Michael

Niantic Labs 496 Dec 30, 2022
Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"

FAME: Feature-based Adversarial Meta-Embeddings This is the companion code for the experiments reported in the paper "FAME: Feature-Based Adversarial

Bosch Research 11 Nov 27, 2022
Compute FID scores with PyTorch.

FID score for PyTorch This is a port of the official implementation of Fréchet Inception Distance to PyTorch. See https://github.com/bioinf-jku/TTUR f

2.1k Jan 06, 2023
[PNAS2021] The neural architecture of language: Integrative modeling converges on predictive processing

The neural architecture of language: Integrative modeling converges on predictive processing Code accompanying the paper The neural architecture of la

Martin Schrimpf 36 Dec 01, 2022
DenseNet Implementation in Keras with ImageNet Pretrained Models

DenseNet-Keras with ImageNet Pretrained Models This is an Keras implementation of DenseNet with ImageNet pretrained weights. The weights are converted

Felix Yu 568 Oct 31, 2022
LegoDNN: a block-grained scaling tool for mobile vision systems

Table of contents 1 Introduction 1.1 Major features 1.2 Architecture 2 Code and Installation 2.1 Code 2.2 Installation 3 Repository of DNNs in vision

41 Dec 24, 2022
Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging, ICCV2021 [PyTorch Code]

Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging, ICCV2021 [PyTorch Code]

Jian Zhang 20 Oct 24, 2022
This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

CRGNN Paper : Improving the Training of Graph Neural Networks with Consistency Regularization Environments Implementing environment: GeForce RTX™ 3090

THUDM 28 Dec 09, 2022
A code implementation of AC-GC: Activation Compression with Guaranteed Convergence, in NeurIPS 2021.

Code For AC-GC: Lossy Activation Compression with Guaranteed Convergence This code is intended to be used as a supplemental material for submission to

Dave Evans 2 Nov 01, 2022
Tensorflow implementation of ID-Unet: Iterative Soft and Hard Deformation for View Synthesis.

ID-Unet: Iterative-view-synthesis(CVPR2021 Oral) Tensorflow implementation of ID-Unet: Iterative Soft and Hard Deformation for View Synthesis. Overvie

17 Aug 23, 2022