An 16kHz implementation of HiFi-GAN for soft-vc.

Last update: Dec 27, 2022

Overview

HiFi-GAN

An 16kHz implementation of HiFi-GAN for soft-vc.

Relevant links:

Example Usage

import torch
import numpy as np

# Load checkpoint
hifigan = torch.hub.load("bshall/hifigan:main", "hifigan_hubert_soft").cuda()
# Load mel-spectrogram
mel = torch.from_numpy(np.load("path/to/mel")).unsqueeze(0).cuda()
# Generate
wav, sr = hifigan.generate(mel)

Train

Step 1: Download and extract the LJ-Speech dataset

Step 2: Resample the audio to 16kHz:

usage: resample.py [-h] [--sample-rate SAMPLE_RATE] in-dir out-dir

Resample an audio dataset.

positional arguments:
  in-dir                path to the dataset directory
  out-dir               path to the output directory

optional arguments:
  -h, --help            show this help message and exit
  --sample-rate SAMPLE_RATE
                        target sample rate (default 16kHz)

Step 3: Download the dataset splits and move them into the root of the dataset directory. After steps 2 and 3 your dataset directory should look like this:

LJSpeech-1.1
│   test.txt
│   train.txt
│   validation.txt
├───mels
└───wavs

Note: the mels directory is optional. If you want to fine-tune HiFi-GAN the mels directory should contain ground-truth aligned spectrograms from an acoustic model.

Step 4: Train HiFi-GAN:

usage: train.py [-h] [--resume RESUME] [--finetune] dataset-dir checkpoint-dir

Train or finetune HiFi-GAN.

positional arguments:
  dataset-dir      path to the preprocessed data directory
  checkpoint-dir   path to the checkpoint directory

optional arguments:
  -h, --help       show this help message and exit
  --resume RESUME  path to the checkpoint to resume from
  --finetune       whether to finetune (note that a resume path must be given)

Generate

To generate using the trained HiFi-GAN models, see Example Usage or use the generate.py script:

usage: generate.py [-h] [--model-name {hifigan,hifigan-hubert-soft,hifigan-hubert-discrete}] in-dir out-dir

Generate audio for a directory of mel-spectrogams using HiFi-GAN.

positional arguments:
  in-dir                path to directory containing the mel-spectrograms
  out-dir               path to output directory

optional arguments:
  -h, --help            show this help message and exit
  --model-name {hifigan,hifigan-hubert-soft,hifigan-hubert-discrete}
                        available models

Acknowledgements

This repo is based heavily on https://github.com/jik876/hifi-gan.

You might also like...

Fast Soft Color Segmentation

3 Oct 29, 2022

Permute Me Softly: Learning Soft Permutations for Graph Representations

7 Jul 10, 2022

Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

8 Sep 30, 2022

[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

CPDeform Code and data for paper Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics at ICLR 2022 (Spotlight). @InProceed

29 Nov 29, 2022

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

1.5k Jan 2, 2023

Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper

TransGanFormer (wip) Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GansFormer and TransGan paper. I

146 Dec 6, 2022

PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

DECOR-GAN PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement, Zhiqin Chen, Vladimir G. Kim, Matthew Fish

72 Dec 31, 2022

This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting.

GAN Memory for Lifelong learning This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting. Please consider citing our paper

43 Dec 27, 2022

[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs In this work, we propose a framework HijackGAN, which enables non-linear latent space travers

46 Sep 5, 2022

Comments

is pretrained weight of discriminator of base model available?

Thanks for nice work. @bshall

I'm trying to train hifigan now, but it takes so long training it from scratch using other dataset.

If discriminator of base model is also available, I could start finetuning based on that vocoder. it seems that you released only generator. Could you also release discriminator weights?

opened by seastar105 3
NaN during training when using own dataset
While fine-tuning works as expected, doing regular training with a dataset that isn't LJSpeech would eventually cause a NaN loss at some point. The culprit appears to be the following line, which causes a division by zero if wav happens to contain perfect silence:

https://github.com/bshall/hifigan/blob/374a4569eae5437e2c80d27790ff6fede9fc1c46/hifigan/dataset.py#L106

I'm not sure what the best solution for this would be, as a quick fix I simply clipped the divisor so it can't reach zero:

wav = flip * gain * wav / max([wav.abs().max(), 0.001])
opened by cjay42 0
How to use this Vocoder with your Tacotron?

Thank you for your work. I used your Tacotron in your Universal Vocoding.The quality of the speech is excellent. However, the inference speed is slow. for that reason, I would like to use this hifigan as a vocoder. But Tacotron's n_mel is 80, while hifigan's n_mel is 128. How to use hifigan with Tacotron?

opened by gheyret 0

Releases(v0.1)

v0.1(Oct 17, 2021)

HiFi-GAN vocoders fine-tuned on the HuBERT-Soft and HuBERT-Discrete voice conversion systems.
Source code(tar.gz)
Source code(zip)
dev.txt(1.17 KB)
hifigan-67926ec6.pt(54.89 MB)
hifigan-hubert-discrete-bbad3043.pt(54.89 MB)
hifigan-hubert-soft-65f03469.pt(54.89 MB)
test.txt(1.17 KB)
train.txt(151.17 KB)

Owner

Benjamin van Niekerk

PhD student at Stellenbosch University. Interested in speech and audio technology.

GitHub Repository https://bshall.github.io/soft-vc/

Python script to download the celebA-HQ dataset from google drive

download-celebA-HQ Python script to download and create the celebA-HQ dataset. WARNING from the author. I believe this script is broken since a few mo

133 Dec 21, 2022

BrainGNN - A deep learning model for data-driven discovery of functional connectivity

A deep learning model for data-driven discovery of functional connectivity https://doi.org/10.3390/a14030075 Usman Mahmood, Zengin Fu, Vince D. Calhou

3 Aug 28, 2022

CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

CycleTransGAN-EVC CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer Demo emotion CycleTransGAN CycleTransGAN Cycle

24 Dec 15, 2022

Tgbox-bench - Simple TGBOX upload speed benchmark

TGBOX Benchmark This script will benchmark upload speed to TGBOX storage. Build

1 Jan 09, 2022

Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

This is a Python implementation of cover trees, a data structure for finding nearest neighbors in a general metric space (e.g., a 3D box with periodic

28 Nov 25, 2022

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

94 Nov 21, 2022

A tensorflow implementation of GCN-LPA

GCN-LPA This repository is the implementation of GCN-LPA (arXiv): Unifying Graph Convolutional Neural Networks and Label Propagation Hongwei Wang, Jur

83 Nov 28, 2022

Inference pipeline for our participation in the FeTA challenge 2021.

feta-inference Inference pipeline for our participation in the FeTA challenge 2021. Team name: TRABIT Installation Download the two folders in https:/

2 Apr 13, 2022

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

405 Jan 06, 2023

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Multi-speaker DGP This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch. O

24 Sep 07, 2022

A Small and Easy approach to the BraTS2020 dataset (2D Segmentation)

BraTS2020 A Light & Scalable Solution to BraTS2020 | Medical Brain Tumor Segmentation (2D Segmentation) Developed the segmentation models for segregat

0 Jan 19, 2022

This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

A Workbook for the Qiskit Developer Certification Exam Hello everyone! This is Bartu, a fellow Qiskitter. I have recently taken the Certification exam

66 Dec 10, 2022

Code for How To Create A Fully Automated AI Based Trading System With Python

AI Based Trading System This code works as a boilerplate for an AI based trading system with yfinance as data source and RobinHood or Alpaca as broker

196 Jan 05, 2023

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, L

3 Dec 02, 2022

This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

HealthGen: Conditional EHR Time Series Generation This repository contains the implementation of the HealthGen model, a generative model to synthesize

0 Jan 20, 2022

An 16kHz implementation of HiFi-GAN for soft-vc.

Related tags

Overview

HiFi-GAN

Example Usage

Train

Generate

Acknowledgements

You might also like...

Fast Soft Color Segmentation

Permute Me Softly: Learning Soft Permutations for Graph Representations

Multi-task Multi-agent Soft Actor Critic for SMAC

[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper

PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting.

[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Comments

is pretrained weight of discriminator of base model available?

NaN during training when using own dataset

How to use this Vocoder with your Tacotron?

Releases(v0.1)

v0.1(Oct 17, 2021)

Owner

Benjamin van Niekerk

Python script to download the celebA-HQ dataset from google drive

BrainGNN - A deep learning model for data-driven discovery of functional connectivity

CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

Tgbox-bench - Simple TGBOX upload speed benchmark

Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

A tensorflow implementation of GCN-LPA

Inference pipeline for our participation in the FeTA challenge 2021.

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

A Small and Easy approach to the BraTS2020 dataset (2D Segmentation)

This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

Code for How To Create A Fully Automated AI Based Trading System With Python

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

Pytorch Lightning code guideline for conferences

Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

Example for AUAV 2022 with obstacle avoidance.

[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)