Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Last update: Sep 07, 2022

Related tags

Overview

Multi-speaker DGP

This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Our paper: Deep Gaussian Process Based Multi-speaker Speech Synthesis with Latent Speaker Representation

Test environment

This repository is tested in the following environment.

Ubuntu 18.04
NVIDIA GeForce RTX 2080 Ti
Python 3.7.3
CUDA 11.1
cuDNN 8.1.1

Setup

You can complete setup by simply executing setup.sh.

$ . ./setup.sh

*Please make sure that installed PyTorch is compatible with CUDA (see https://pytorch.org/ for more info). Otherwise, CUDA error will occur during training.

How to use

This repository is designed according to Kaldi-style recipe. To run the scripts, please follow the below instruction. JVS corpus [Takamichi et al., 2020] can be downloaded from here.

# Move to the recipe directory
$ cd egs/jvs

# Download the corpus to be used. The directory structure will be as follows:

├── conf/     # directory containing YAML format configuration files
├── jvs_ver1/ # downloaded data
├── local/    # directory containing corpus-dependent scripts
└── run.sh    # main scripts

# Run the recipe from scratch
$ ./run.sh

# Or you can run the recipe step by step
$ ./run.sh --stage 0 --stop-stage 0  # train/dev/eval split
$ ./run.sh --stage 1 --stop-stage 1  # preprocessing
$ ./run.sh --stage 2 --stop-stage 2  # train phoneme duration model
$ ./run.sh --stage 3 --stop-stage 3  # train acoustic model
$ ./run.sh --stage 4 --stop-stage 4  # decoding

# During stage 2 & 3, you can monitor logs using Tensorboard
# for example:
$ tensorboard --logdir exp/dgp

How to customize

conf/*.yaml include all settings for data preparation, preprocessing, training, and decoding. We have prepared two configuration files, dgp.yaml and dgplvm.yaml. You can change experimental conditions by editing these files.

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Related tags

Overview

Multi-speaker DGP

Test environment

Setup

How to use

How to customize

Owner

sarulab-speech

Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

AI4Good project for detecting waste in the environment

Si Adek Keras is software VR dangerous object detection.

OpenMMLab Computer Vision Foundation

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

A python library for time-series smoothing and outlier detection in a vectorized way.

Libraries, tools and tasks created and used at DeepMind Robotics.

Implementation of the state-of-the-art vision transformers with tensorflow

Unsupervised Representation Learning by Invariance Propagation

Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021

Code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks

Time should be taken seer-iously

Search and filter videos based on objects that appear in them using convolutional neural networks

Bridging Composite and Real: Towards End-to-end Deep Image Matting

This python-based package offers a way of creating a parametric OpenMC plasma source from plasma parameters.

A Home Assistant custom component for Lobe. Lobe is an AI tool that can classify images.

DenseNet Implementation in Keras with ImageNet Pretrained Models

Learning Calibrated-Guidance for Object Detection in Aerial Images

Visual odometry package based on hardware-accelerated NVIDIA Elbrus library with world class quality and performance.