neural network based speaker embedder

Last update: Dec 29, 2022

Overview

Content

What is deepaudio-speaker?

Deepaudio-speaker is a framework for training neural network based speaker embedders. It supports online audio augmentation thanks to torch-audiomentation. It inlcudes or will include popular neural network architectures and losses used for speaker embedder.

To make it easy to use various functions such as mixed-precision, multi-node training, and TPU training etc, I introduced PyTorch-Lighting and Hydra in this framework (just like what pyannote-audio and openspeech do).

Deepaudio-tts is coming soon.

Installation

conda create -n deepaudio python=3.8.5
conda activate deepaudio
conda install numpy cffi
conda install libsndfile=1.0.28 -c conda-forge
git clone https://github.com/deepaudio/deepaudio-speaker.git
cd deepaudio-speaker
pip install -e .

Get Started

Supported Datasets

####Voxceleb2

Download VoxCeleb dataset and follow this script to obtain this kind of directory structure:

/path/to/voxceleb/voxceleb1/dev/wav/id10001/1zcIwhmdeo4/00001.wav
/path/to/voxceleb/voxceleb1/test/wav/id10270/5r0dWxy17C8/00001.wav
/path/to/voxceleb/voxceleb2/dev/aac/id00012/21Uxsk56VDQ/00001.m4a
/path/to/voxceleb/voxceleb2/test/aac/id00017/01dfn2spqyE/00001.m4a

Training examples

Example1: Train the ecapa-tdnn model with fbank features on GPU.

$ deepaudio-speaker-train  \
    dataset=voxceleb2 \
    dataset.dataset_path=/your/path/to/voxceleb2/dev/wav/ \
    model=ecapa \
    model.channels=1024 \
    feature=fbank \
    lr_scheduler=warmup_reduce_lr_on_plateau \
    trainer=gpu \
    criterion=aamsoftmax

Example2: Extract speaker embedding with trained model.

Todo

Model Architecture

ECAPA-TDNN This is an unofficial implementation from @lawlict. Please find more details in this link.

ECAPA-TDNN This is implemented by @joonson. Please find more details in this link.

ResNetSE34L This is borrowed from voxceleb trainer.

ResNetSE34V2 This is borrowed from voxceleb trainer.

resnet101 This is proposed by BUT for speaker diarization. Please note that the feature used in this framework is different from VB-HMM

How to contribute to deepaudio-speaker

It is a personal project. So I don't have enough gpu resources to do a lot of experiments. I appreciate any kind of feedback or contributions. Please feel free to make a pull requsest for some small issues like bug fixes, experiment results. If you have any questions, please open an issue.

Acknowledge

I borrow a lot of codes from openspeech and pyannote-audio

neural network based speaker embedder

Related tags

Overview

Content

What is deepaudio-speaker?

Installation

Get Started

Supported Datasets

Training examples

Model Architecture

How to contribute to deepaudio-speaker

Acknowledge

Owner

华为商城抢购手机的Python脚本 Python script of Huawei Store snapping up mobile phones

chaii - hindi & tamil question answering

Pipelines de datos, 2021.

customer care chatbot made with Rasa Open Source.

An open collection of annotated voices in Japanese language

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

NLP topic mdel LDA - Gathered from New York Times website

Code for the Python code smells video on the ArjanCodes channel.

A CSRankings-like index for speech researchers

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

Simple translation demo showcasing our headliner package.

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

Fuzzy String Matching in Python

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo