Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

Overview

HiFi-GAN+

This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All You Need by Jiaqi Su, Yunyun Wang, Adam Finkelstein, and Zeyu Jin.

The model takes a band-limited audio signal (usually 8/16/24kHz) and attempts to reconstruct the high frequency components needed to restore a full-band signal at 48kHz. This is useful for upsampling low-rate outputs from upstream tasks like text-to-speech, voice conversion, etc. or enhancing audio that was filtered to remove high frequency noise. For more information, please see this blog post.

Status

PyPI Tests Coveralls DOI

Wandb Gradio Colab

Usage

The example below uses a pretrained HiFi-GAN+ model to upsample a 1 second 24kHz sawtooth to 48kHz.

import torch
from hifi_gan_bwe import BandwidthExtender

model = BandwidthExtender.from_pretrained("hifi-gan-bwe-10-42890e3-vctk-48kHz")

fs = 24000
x = torch.full([fs], 261.63 / fs).cumsum(-1) % 1.0 - 0.5
y = model(x, fs)

There is a Gradio demo on HugggingFace Spaces where you can upload audio clips and run the model. You can also run the model on Colab with this notebook.

Running with pipx

The HiFi-GAN+ library can be run directly from PyPI if you have the pipx application installed. The following script uses a hosted pretrained model to upsample an MP3 file to 48kHz. The input audio can be in any format supported by the audioread library, and the output can be in any format supported by soundfile.

pipx run --python=python3.9 hifi-gan-bwe \
  hifi-gan-bwe-10-42890e3-vctk-48kHz \
  input.mp3 \
  output.wav

Running in a Virtual Environment

If you have a Python 3.9 virtual environment installed, you can install the HiFi-GAN+ library into it and run synthesis, training, etc. using it.

pip install hifi-gan-bwe

hifi-synth hifi-gan-bwe-10-42890e3-vctk-48kHz input.mp3 output.wav

Pretrained Models

The following models can be loaded with BandwidthExtender.from_pretrained and used for audio upsampling. You can also download the model file from the link and use it offline.

Name Sample Rate Parameters Wandb Metrics Notes
hifi-gan-bwe-10-42890e3-vctk-48kHz 48kHz 1M bwe-10-42890e3 Same as bwe-05, but uses bandlimited interpolation for upsampling, for reduced noise and aliasing. Uses the same parameters as resampy's kaiser_best mode.
hifi-gan-bwe-11-d5f542d-vctk-8kHz-48kHz 48kHz 1M bwe-11-d5f542d Same as bwe-10, but trained only on 8kHz sources, for specialized upsampling.
hifi-gan-bwe-12-b086d8b-vctk-16kHz-48kHz 48kHz 1M bwe-12-b086d8b Same as bwe-10, but trained only on 16kHz sources, for specialized upsampling.
hifi-gan-bwe-13-59f00ca-vctk-24kHz-48kHz 48kHz 1M bwe-13-59f00ca Same as bwe-10, but trained only on 24kHz sources, for specialized upsampling.
hifi-gan-bwe-05-cd9f4ca-vctk-48kHz 48kHz 1M bwe-05-cd9f4ca Trained for 200K iterations on the VCTK speech dataset with noise agumentation from the DNS Challenge dataset.

Training

If you want to train your own model, you can use any of the methods above to install/run the library or fork the repo and run the script commands locally. The following commands are supported:

Name Description
hifi-train Starts a new training run, pass in a name for the run.
hifi-clone Clone an existing training run at a given or the latest checkpoint.
hifi-export Optimize a model for inference and export it to a PyTorch model file (.pt).
hifi-synth Run model inference using a trained model on a source audio file.

For example, you might start a new training run called bwe-01 with the following command:

hifi-train 01

To train a model, you will first need to download the VCTK and DNS Challenge datasets. By default, these datasets are assumed to be in the ./data/vctk and ./data/dns directories. See train.py for how to specify your own training data directories. If you want to use a custom training dataset, you can implement a dataset wrapper in datasets.py.

The training scripts use wandb.ai for experiment tracking and visualization. Wandb metrics can be disabled by passing --no_wandb to the training script. All of my own experiment results are publicly available at wandb.ai/brentspell/hifi-gan-bwe.

Each training run is identified by a name and a git hash (ex: bwe-01-8abbca9). The git hash is used for simple experiment tracking, reproducibility, and model provenance. Using git to manage experiments also makes it easy to change model hyperparameters by simply changing the code, making a commit, and starting the training run. This is why there is no hyperparameter configuration file in the project, since I often end up having to change the code anyway to run interesting experiments.

Development

Setup

The following script creates a virtual environment using pyenv for the project and installs dependencies.

pyenv install 3.9.10
pyenv virtualenv 3.9.10 hifi-gan-bwe
pip install -r requirements.txt

If you want to run the hifi-* scripts described above in development, you can install the package locally:

pip install -e .

You can then run tests, etc. follows:

pytest --cov=hifi_gan_bwe
black .
isort --profile=black .
flake8 .
mypy .

These checks are also included in the pre-commit configuration for the project, so you can set them up to run automatically on commit by running

pre-commit install

Acknowledgements

The original research on the HiFi-GAN+ model is not my own, and all credit goes to the paper's authors. I also referred to kan-bayashi's excellent Parallel WaveGAN implementation, specifically the WaveNet module. If you use this code, please cite the original paper:

@inproceedings{su2021bandwidth,
  title={Bandwidth extension is all you need},
  author={Su, Jiaqi and Wang, Yunyun and Finkelstein, Adam and Jin, Zeyu},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={696--700},
  year={2021},
  organization={IEEE},
  url={https://doi.org/10.1109/ICASSP39728.2021.9413575},
}

License

Copyright © 2022 Brent M. Spell

Licensed under the MIT License (the "License"). You may not use this package except in compliance with the License. You may obtain a copy of the License at

https://opensource.org/licenses/MIT

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Owner
Brent M. Spell
Brent M. Spell
PyTorch implementations of neural network models for keyword spotting

Honk: CNNs for Keyword Spotting Honk is a PyTorch reimplementation of Google's TensorFlow convolutional neural networks for keyword spotting, which ac

Castorini 475 Dec 15, 2022
Pre-Training Graph Neural Networks for Cold-Start Users and Items Representation.

Pretrain-Recsys This is our Tensorflow implementation for our WSDM 2021 paper: Bowen Hao, Jing Zhang, Hongzhi Yin, Cuiping Li, Hong Chen. Pre-Training

30 Nov 14, 2022
Flexible-CLmser: Regularized Feedback Connections for Biomedical Image Segmentation

Flexible-CLmser: Regularized Feedback Connections for Biomedical Image Segmentation The skip connections in U-Net pass features from the levels of enc

Boheng Cao 1 Dec 29, 2021
Rede Neural Convolucional feita durante o processo seletivo do Laboratório de Inteligência Artificial da FACOM (UFMS)

Primeira_Rede_Neural_Convolucional Rede Neural Convolucional feita durante o processo seletivo do Laboratório de Inteligência Artificial da FACOM (UFM

Roney_Felipe 1 Jan 13, 2022
EMNLP 2020 - Summarizing Text on Any Aspects

Summarizing Text on Any Aspects This repo contains preliminary code of the following paper: Summarizing Text on Any Aspects: A Knowledge-Informed Weak

Bowen Tan 35 Nov 14, 2022
Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions"

Graph Convolution Simulator (GCS) Source code for "Understanding Knowledge Integration in Language Models with Graph Convolutions" Requirements: PyTor

yifan 10 Oct 18, 2022
Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".

RE_improved_baseline Code for technical report "An Improved Baseline for Sentence-level Relation Extraction". Requirements torch = 1.8.1 transformers

Wenxuan Zhou 74 Nov 29, 2022
A PyTorch Implementation of Neural IMage Assessment

NIMA: Neural IMage Assessment This is a PyTorch implementation of the paper NIMA: Neural IMage Assessment (accepted at IEEE Transactions on Image Proc

yunxiaos 418 Dec 29, 2022
Laser device for neutralizing - mosquitoes, weeds and pests

Laser device for neutralizing - mosquitoes, weeds and pests (in progress) Here I will post information for creating a laser device. A warning!! How It

Ildaron 1k Jan 02, 2023
Official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR)

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

12 Jan 13, 2022
Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech This repository is the official implementation of "Meta-TTS: Meta-Learning for Few

Sung-Feng Huang 128 Dec 25, 2022
A big endian Gentoo port developed on a Pine64.org RockPro64

Gentoo-aarch64_be A big endian Gentoo port developed on a Pine64.org RockPro64 The endian wars are over... little endian won. As a result, it is incre

Rory Bolt 6 Dec 07, 2022
Fast and robust certifiable relative pose estimation

Fast and Robust Relative Pose Estimation for Calibrated Cameras This repository contains the code for the relative pose estimation between two central

42 Dec 06, 2022
A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions

A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions Kapoutsis, A.C., Chatzichristofis,

Athanasios Ch. Kapoutsis 5 Oct 15, 2022
Python Blood Vessel Topology Analysis

Python Blood Vessel Topology Analysis This repository is not being updated anymore. The new version of PyVesTo is called PyVaNe and is available at ht

6 Nov 15, 2022
This is the offical website for paper ''Category-consistent deep network learning for accurate vehicle logo recognition''

The Pytorch Implementation of Category-consistent deep network learning for accurate vehicle logo recognition This is the offical website for paper ''

Wanglong Lu 28 Oct 29, 2022
All of the figures and notebooks for my deep learning book, for free!

"Deep Learning - A Visual Approach" by Andrew Glassner This is the official repo for my book from No Starch Press. Ordering the book My book is called

Andrew Glassner 227 Jan 04, 2023
Spatial Action Maps for Mobile Manipulation (RSS 2020)

spatial-action-maps Update: Please see our new spatial-intention-maps repository, which extends this work to multi-agent settings. It contains many ne

Jimmy Wu 27 Nov 30, 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

Billy HE 141 Dec 30, 2022
The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., de

Jie Huang 14 Oct 21, 2022