ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

ERISHA is a multilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available. The term ERISHA means speech in Sanskrit. The framework of ERISHA includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Currently, the library is in its initial stage of development and will be updated frequently in the coming days.

Stay tuned for more updates, and we are open to collaboration !!!

Installation and Training

Refer INSTALL for initial setup

Available recipes

Available Features

Resampling of speech waveforms to target sampling rate in recipes
Support to train TTS system for other languages
Support to train Multilingual TTS system for other languages

Upcoming updates

[User Documentation]
Pytorch Lightning
Multiclass N-pair loss
[Cluster sampling for improving latent representation of speaker and expressivity](Proposed work)

Acknowledgements

This implementation uses code from the following repos: NVIDIA, Keith Ito, Prem Seetharaman, Chengqi Deng,Dannynis, Jhosimar George Arias Figueroa

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Related tags

Overview

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

Installation and Training

Available recipes

Available Features

Upcoming updates

Acknowledgements

Owner

Ajinkya Kulkarni

Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling"

MINERVA: An out-of-the-box GUI tool for offline deep reinforcement learning

FCN (Fully Convolutional Network) is deep fully convolutional neural network architecture for semantic pixel-wise segmentation

General Multi-label Image Classification with Transformers

Statistical-Rethinking-with-Python-and-PyMC3 - Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

Use evolutionary algorithms instead of gridsearch in scikit-learn

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

Speckle-free Holography with Partially Coherent Light Sources and Camera-in-the-loop Calibration

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

Selective Wavelet Attention Learning for Single Image Deraining

This program creates a formatted excel file which highlights the undervalued stock according to Graham's number.

Deep Reinforcement Learning for Multiplayer Online Battle Arena

An University Project of Quera Web Crawling.

End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)

Recursive Bayesian Networks

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors