Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Last update: Nov 28, 2022

Related tags

Overview

Continuous Speech Separation with Conformer

Introduction

We examine the use of the Conformer architecture for continuous speech separation. Conformer allows the separation model to efficiently capture both local and global context information, which is helpful for speech separation. Experimental results using the LibriCSS dataset show that the Conformer separation model achieves state of the art results for both single-channel and multi-channel settings.

For a detailed description and experimental results, please refer to our paper: Continuous Speech Separation with Conformer (Accepted by ICASSP 2021).

Environment

python 3.6.9, torch 1.7.1

Get Started

Download the overlapped speech of LibriCSS dataset.

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT" -O overlapped_speech.zip && rm -rf /tmp/cookies.txt && unzip overlapped_speech.zip && rm overlapped_speech.zip

Download the Conformer separation models.

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1" -O checkpoints.zip && rm -rf /tmp/cookies.txt && unzip checkpoints.zip && rm checkpoints.zip

Run the separation.

3.1 Single-channel separation

export MODEL_NAME=1ch_conformer_base
python3 separate.py \
    --checkpoint checkpoints/$MODEL_NAME \
    --mix-scp utils/overlapped_speech_1ch.scp \
    --dump-dir separated_speech/monaural/utterances_with_$MODEL_NAME \
    --device-id 0 \
    --num_spks 2

The separated speech can be found in the directory 'separated_speech/monaural/utterances_with_$MODEL_NAME'

3.2 Seven-channel separation

export MODEL_NAME=conformer_base
python3 separate.py \
    --checkpoint checkpoints/$MODEL_NAME \
    --mix-scp utils/overlapped_speech_7ch.scp \
    --dump-dir separated_speech/7ch/utterances_with_$MODEL_NAME \
    --device-id 0 \
    --num_spks 2 \
    --mvdr True

The separated speech can be found in the directory 'separated_speech/7ch/utterances_with_$MODEL_NAME'

Citation

If you find our work useful, please cite our paper:

@inproceedings{CSS_with_Conformer,
  title={Continuous speech separation with conformer},
  author={Chen, Sanyuan and Wu, Yu and Chen, Zhuo and Wu, Jian and Li, Jinyu and Yoshioka, Takuya and Wang, Chengyi and Liu, Shujie and Zhou, Ming},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5749--5753},
  year={2021},
  organization={IEEE}
}

Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Related tags

Overview

Continuous Speech Separation with Conformer

Introduction

Environment

Get Started

Citation

Owner

Sanyuan Chen (陈三元)

Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

DLFlow is a deep learning framework.

Background-Click Supervision for Temporal Action Localization

A curated list of Machine Learning and Deep Learning tutorials in Jupyter Notebook format ready to run in Google Colaboratory

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

[ICCV 2021 Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Simple converter for deploying Stable-Baselines3 model to TFLite and/or Coral

EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation (CVPR'21)

Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

The codes I made while I practiced various TensorFlow examples

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

Pytorch implementation of DeePSiM

Official Implementation of "Transformers Can Do Bayesian Inference"

DenseNet Implementation in Keras with ImageNet Pretrained Models

DeepRec is a recommendation engine based on TensorFlow.

Using Tensorflow Object Detection API to detect Waymo open dataset

Python tools for 3D face: 3DMM, Mesh processing(transform, camera, light, render), 3D face representations.

SpanNER: Named EntityRe-/Recognition as Span Prediction

Another pytorch implementation of FCN (Fully Convolutional Networks)