Python codes for Lite Audio-Visual Speech Enhancement.

Last update: Dec 01, 2022

Related tags

Deep Learning LAVSE

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE).

We have also put some preprocessed sample data (including enhanced results) in this repository.

The dataset of TMSV (Taiwan Mandarin speech with video) used in LAVSE is released here.

Please cite the following paper if you find the codes useful in your research.

@inproceedings{chuang2020lite,
  title={Lite Audio-Visual Speech Enhancement},
  author={Chuang, Shang-Yi and Tsao, Yu and Lo, Chen-Chou and Wang, Hsin-Min},
  booktitle={Proc. Interspeech 2020}
}

Prerequisites

Ubuntu 18.04
Python 3.6
CUDA 10

You can use pip to install Python depedencies.

pip install -r requirements.txt

Usage

You can simply enter the command below and the average PESQ and STOI results will show on your terminal pane.

Remember to activate visdom (probably in a screen or tmux) for recording the training loss before bashing the script.

bash run.sh

Go check run.sh if you need further information about the command lines.

License

The LAVSE work is released under MIT License.

See LICENSE for more details.

Acknowledgments

Bio-ASP Lab, CITI, Academia Sinica, Taipei, Taiwan
SLAM Lab, IIS, Academia Sinica, Taipei, Taiwan

Python codes for Lite Audio-Visual Speech Enhancement.

Related tags

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

Prerequisites

Usage

License

Acknowledgments

Owner

Shang-Yi Chuang

Understanding Hyperdimensional Computing for Parallel Single-Pass Learning

Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection

PAthological QUpath Obsession - QuPath and Python conversations

Repository for tackling Kaggle Ultrasound Nerve Segmentation challenge using Torchnet.

PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation.

Implementation of "RaScaNet: Learning Tiny Models by Raster-Scanning Image" from CVPR 2021.

DCGAN LSGAN WGAN-GP DRAGAN PyTorch

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

Easy genetic ancestry predictions in Python

A simple, fully convolutional model for real-time instance segmentation.

This is a Tensorflow implementation of Learning to See in the Dark in CVPR 2018

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

Pacman-AI - AI project designed by UC Berkeley. Designed reflex and minimax agents for the game Pacman.

A hybrid SOTA solution of LiDAR panoptic segmentation with C++ implementations of point cloud clustering algorithms. ICCV21, Workshop on Traditional Computer Vision in the Age of Deep Learning

Code for ICCV2021 paper PARE: Part Attention Regressor for 3D Human Body Estimation

StarGAN2 for practice

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".