Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Overview

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Figure: Example visualization of the method and baseline as a spectogram

This is the implementation of our Project for the course "Deep Learning: Architectures and Methods" by Prof. Christian Kersting from the Artificial Intelligence and Machine Learning Lab at the Technical University of Darmstadt in the summer semester 2021.

In the field of audio signal processing, Super-Resolution is one of the most relevant topics. The motivation is to reconstruct high- quality audio from low-quality signals. From a practical perspective, the technique has applications in telephony or generally in applications in which audio is transmitted and has to be compressed accordingly. Other applications are the processing of ancient recordings, for example old sound recordings of music, speech or videos. First approaches of the combination of machine learning and audio signal processing lead to promising results and outperform standard techniques. Accordingly the scope of the project was to reimplement the paper Temporal FiLM: Capturing Long-Range SequenceDependencies with Feature-Wise Modulation by Birnbaum et al. in PyTorch, reproduce the results and extend them further to the music domain.

This repository contains everything needed to prepare the data sets, train the model and create final evaluation and visualization of the results. We also provide the weights of the models to reproduce our reported results.

Installation

This project was originally developed with Python 3.8, PyTorch 1.7, and CUDA 11.0. The training requires at least one NVIDIA GeForce GTX 980 (4GB memory).

  • Create conda environment:
conda create --name audiosr
source activate audiosr
conda install PYTORCH torchvision cudatoolkit=11.0 -c pytorch
  • Install the dependencies:
pip install -r requirements.txt

Dataset preparation

To reproduce the results shown below tha datasets have to be prepared. This repo includes scripts to prepare the following dataset:

VCTK preparation

  • run prep_dataset.py from ./datasets to create a h5 container of a specified input.
  • to reproduce results prepare the following h5 files:
python prep_dataset.py \
  --file-list vctk/speaker1/speaker1-train-files.txt \
  --in-dir ./VCTK-Corpus/wav48/p225/ \
  --out vctk-speaker1-train.4.16000.8192.4096.h5 \
  --scale 4 \
  --sr 16000 \
  --dimension 8192 \
  --stride 4096 \
  --interpolate \
  --low-pass
python prep_dataset.py \
  --file-list vctk/speaker1/speaker1-val-files.txt \
  --in-dir ./VCTK-Corpus/wav48/p225/ \
  --out vctk-speaker1-val.4.16000.8192.4096.h5 \
  --scale 4 \
  --sr 16000 \
  --dimension 8192 \
  --stride 4096 \
  --interpolate \
  --low-pass

GTZAN preparation

  • run prep_dataset.py from ./datasets to create a h5 container of a specified input.
  • to reproduce results prepare the following h5 files:
python prep_dataset.py \
  --file-list gtzan/blues_wav_list_train.txt \
  --in-dir gtzan/data/genres/blues/ \
  --out blues-train.4.22000.8192.16384.h5 \
  --scale 4 \
  --sr 22000 \
  --dimension 8192 \
  --stride 16384 \
  --interpolate \
  --low-pass
python prep_dataset.py \
  --file-list gtzan/blues_wav_list_val.txt \
  --in-dir gtzan/data/genres/blues/ \
  --out blues-val.4.22000.8192.16384.h5 \
  --scale 4 \
  --sr 22000 \
  --dimension 8192 \
  --stride 16384 \
  --interpolate \
  --low-pass

Piano dataset preparation

python prep_piano.py \
  --file-list data/music_train.npy \
  --out piano-train.4.16000.8192.131072.h5 \
  --scale 4 \
  --sr 16000 \
  --dimension 8192 \
  --stride 131072 \
  --interpolate \
  --low-pass
python prep_piano.py \
  --file-list data/music_valid.npy \
  --out piano-val.4.16000.8192.131072.h5 \
  --scale 4 \
  --sr 16000 \
  --dimension 8192 \
  --stride 131072 \
  --interpolate \
  --low-pass

Notes:

  • the --in-dir argument has to be adapted to the respective dataset location
  • The dimension parameter and sampling rate define the absolute length of a patch (dim/sr = length patch)

Model

Generally, there are three main models in this implementation.

Baseline

On the one hand the b-spline interpolation which serves as the baseline and can be found in the data loader in prep_dataset.py.

Model

On the other hand two neural networks whose implementation can be found in the /models/ folder. In a first step a model was implemented which uses a batchnorm layer instead of the later used TFILM layer. This is implemented in audiounet.py. The final model, which is also used in the paper, can be found in tfilmunet.py.

Train Model

To run the trainings use the following commands and change the dataset root the corresponding domain.

python train.py \
  --dataset-root hereroottodataset! \
  --epochs 50 \
  --lr 3*10e-4 \
  --batch-size 16 

Evaluation

Save examples from inference

It is possible to evaluate any given wav-file with the inference.py script by invoking the --save-example flag and saving the results as wav-files and spectrogram plots. The script performs the following steps:

  • prepares all files in a provided list (--wave-file-list) and creates a low-res version and the baseline reconstruction
  • runs inference on the prepared files to create a super-resolution output
  • saves all results to the "examples" folder with the respective file names
  • saves spectrogram plots of all versions as pdf-files

Notes:

It is important to adapt the sampling parameter (--sr) which is set to 16000 by default. The sampling rate has to be the one of the original wav file. The scale (--scale) defines the down sampling factor which is set to 4 by default. Depending on which trained model is used for the inference the parameters --checkpoints-root and --checkpoint have to be specified accordingly.

To reproduce an example from our plot run the following command from the repo root directory (modify --checkpoints-root if necessary):

python inference.py \
  --save-example \
  --wave-file-list assets/save_wav_list.txt \
  --scale 4 \
  --sr 16000 \
  --checkpoint pretrained/vctk_speaker1_pretrained.pth

Results

Training Dataset Ratio BASELINE SNR (dB) BASELINE LSD (dB) METHOD SNR (dB) METHOD LSD (dB) Checkpoint
VTCK SingleSpeaker r = 4 15.6 5.4 16.6 3.2 Checkpoint
Piano r = 4 19.7 2.9 20.4 2.2 Checkpoint
GTZAN (Genre: Blues) r = 4 13.3 7.8 13.8 3.8 Checkpoint

Qualitative Examples

Here we provide a qualitative example per Dataset. These can be generated using inference.py

VTCK SingleSpeaker Piano GTZAN (Genre: Blues)
Low Resolution Low Resolution Low Resolution
Baseline Baseline Baseline
Method Method Method
High Resolution High Resolution High Resolution
Owner
Oliver Hahn
Master Thesis @VIsual Inference Lab | Grad Student @Technical University of Darmstadt
Oliver Hahn
Code base of object detection

rmdet code base of object detection. 环境安装: 1. 安装conda python环境 - `conda create -n xxx python=3.7/3.8` - `conda activate xxx` 2. 运行脚本,自动安装pytorch1

3 Mar 08, 2022
Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

Vittorio Mazzia 203 Jan 08, 2023
Revisiting Global Statistics Aggregation for Improving Image Restoration

Revisiting Global Statistics Aggregation for Improving Image Restoration Xiaojie Chu, Liangyu Chen, Chengpeng Chen, Xin Lu Paper: https://arxiv.org/pd

MEGVII Research 128 Dec 24, 2022
DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs

DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs Abstract: Image-to-image translation has recently achieved re

yaxingwang 23 Apr 14, 2022
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Translated in 🇰🇷 Korean/ Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on

Ludwig 8.7k Jan 05, 2023
Light-Head R-CNN

Light-head R-CNN Introduction We release code for Light-Head R-CNN. This is my best practice for my research. This repo is organized as follows: light

jemmy li 835 Dec 06, 2022
Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

Yicheng Luo 4 Sep 13, 2022
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

Qiang Wang 49 Dec 18, 2022
use tensorflow 2.0 to tell a dog and cat from a specified picture

dog_or_cat use tensorflow 2.0 to tell a dog and cat from a specified picture This is one of the classic experiments for the introduction of deep learn

你这个代码我看不懂 1 Oct 22, 2021
A Python library for Deep Graph Networks

PyDGN Wiki Description This is a Python library to easily experiment with Deep Graph Networks (DGNs). It provides automatic management of data splitti

Federico Errica 194 Dec 22, 2022
Python3 / PyTorch implementation of the following paper: Fine-grained Semantics-aware Representation Enhancement for Self-supervisedMonocular Depth Estimation. ICCV 2021 (oral)

FSRE-Depth This is a Python3 / PyTorch implementation of FSRE-Depth, as described in the following paper: Fine-grained Semantics-aware Representation

77 Dec 28, 2022
Source Code of NeurIPS21 paper: Recognizing Vector Graphics without Rasterization

YOLaT-VectorGraphicsRecognition This repository is the official PyTorch implementation of our NeurIPS-2021 paper: Recognizing Vector Graphics without

Microsoft 49 Dec 20, 2022
Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

🍐 quince Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding 🍐 Installation $ git clone

Andrew Jesson 19 Jun 23, 2022
🕵 Artificial Intelligence for social control of public administration

Non-tech crash course into Operação Serenata de Amor Tech crash course into Operação Serenata de Amor Contributing with code and tech skills Supportin

Open Knowledge Brasil - Rede pelo Conhecimento Livre 4.4k Dec 31, 2022
This repo is for segmentation of T2 hyp regions in gliomas.

T2-Hyp-Segmentor This repo is for segmentation of T2 hyp regions in gliomas. By downloading the model from here you can use it to segment your T2w ima

1 Jan 18, 2022
NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size

NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size Xuanyi Dong, Lu Liu, Katarzyna Musial, Bogdan Gabrys in IEEE Transactions o

D-X-Y 137 Dec 20, 2022
A PyTorch implementation of the Relational Graph Convolutional Network (RGCN).

Torch-RGCN Torch-RGCN is a PyTorch implementation of the RGCN, originally proposed by Schlichtkrull et al. in Modeling Relational Data with Graph Conv

Thiviyan Singam 66 Nov 30, 2022
A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

Pawel Dziemiach 1 Dec 19, 2021
A deep learning CNN model to identify and classify and check if a person is wearing a mask or not.

Face Mask Detection The Model is designed to check if any human is wearing a mask or not. Dataset Description The Dataset contains a total of 11,792 i

1 Mar 01, 2022
PyArmadillo: an alternative approach to linear algebra in Python

PyArmadillo is a linear algebra library for the Python language, with an emphasis on ease of use.

Terry Zhuo 58 Oct 11, 2022