PyTorch implementation of Tacotron speech synthesis model.

Last update: Dec 09, 2022

Overview

tacotron_pytorch

PyTorch implementation of Tacotron speech synthesis model.

Inspired from keithito/tacotron. Currently not as much good speech quality as keithito/tacotron can generate, but it seems to be basically working. You can find some generated speech examples trained on LJ Speech Dataset at here.

If you are comfortable working with TensorFlow, I'd recommend you to try https://github.com/keithito/tacotron instead. The reason to rewrite it in PyTorch is that it's easier to debug and extend (multi-speaker architecture, etc) at least to me.

Requirements

PyTorch
TensorFlow (if you want to run the training script. This definitely can be optional, but for now required.)

Installation

git clone --recursive https://github.com/r9y9/tacotron_pytorch
pip install -e . # or python setup.py develop

If you want to run the training script, then you need to install additional dependencies.

pip install -e ".[train]"

Training

The package relis on keithito/tacotron for text processing, audio preprocessing and audio reconstruction (added as a submodule). Please follows the quick start section at https://github.com/keithito/tacotron and prepare your dataset accordingly.

If you have your data prepared, assuming your data is in "~/tacotron/training" (which is the default), then you can train your model by:

python train.py

Alignment, predicted spectrogram, target spectrogram, predicted waveform and checkpoint (model and optimizer states) are saved per 1000 global step in checkpoints directory. Training progress can be monitored by:

tensorboard --logdir=log

Testing model

Open the notebook in notebooks directory and change checkpoint_path to your model.

PyTorch implementation of Tacotron speech synthesis model.

Related tags

Overview

tacotron_pytorch

Requirements

Installation

Training

Testing model

Owner

Ryuichi Yamamoto

code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

This repository comes with the paper "On the Robustness of Counterfactual Explanations to Adverse Perturbations"

Differentiable Abundance Matching With Python

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Neural Caption Generator with Attention

Einshape: DSL-based reshaping library for JAX and other frameworks.

Yolo Traffic Light Detection With Python

Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting

Code of the paper "Part Detector Discovery in Deep Convolutional Neural Networks" by Marcel Simon, Erik Rodner and Joachim Denzler

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

Code for paper: "Spinning Language Models for Propaganda-As-A-Service"

Apply our monocular depth boosting to your own network!

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

Dictionary Learning with Uniform Sparse Representations for Anomaly Detection