Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Last update: Jan 03, 2023

Related tags

Overview

Introduction

This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang ([email protected]) and thanks for advice from TongFeng.

SpeakerGAN paper

SpeakerGAN: Speaker identification with conditional generative adversarial network， by Liyang Chen , Yifeng Liu , Wendong Xiao , Yingxue Wang ,Haiyong Xie.

Usage

For train / test / generate:

python speakergan.py

You may need to change the path of wav vad preprocessed files.

Our results

acc: 94.27% with random sampled testset. 

acc: 93.21% with fixed start sampled testset.

using model file: model/49_D.pkl

acc: 98.44% on training classification accuracy with real samples.

There is about 4% gap on testset lower compared to paper result. We can't find out the reason. We want your help !

Details of paper

The following are details about this paper.

================ input ==================

feature: fbank, 8000hz, 25ms frame, 10ms overlap. shape:(160,64)
dataset: librispeech-100 train-clean-100 POI:251
data preprocess: vad、mean and variance normalization, shuffled.
60% train. 40% test.

================ model architecture ==================

dataflow: data -> feature extraction -> G & D
model architecture:

G: gated CNN, encoder-decoder, Huber loss + adversarial loss

D: ResnetBlocks, template average pooling, FC, softmax, crossentropy loss + adversarial loss
G: shuffler layer, GLU
D: ReLU

================ training ==================

lr: 0-9, 0.0005 | 9-49, 0.0002
L(d): λ1 λ2 = 1
batch_size: 64
D_train steps / G_train steps = 4
Ladv Loss: Label smoothing, 1 -> 0.7 ~ 1.0, 0 -> 0 ~ 0.3

======== not sure or differences with paper ========

weights,bias initialize function, use: xavier_uniform and zeros
pytorch huber_loss： + 0.5 to be same with paper. but no implement here.
for shorter wav, paper: padded with zero. we: padded with feature again.
gated cnn architecture.
we use webrtcvad mode(3) for vad preprocess.

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Related tags

Overview

Introduction

SpeakerGAN paper

Usage

Our results

Details of paper

Owner

Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

This is the source code for: Context-aware Entity Typing in Knowledge Graphs.

[AAAI 2022] Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

Official implementation of the Neurips 2021 paper Searching Parameterized AP Loss for Object Detection.

《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

This is a re-implementation of TransGAN: Two Pure Transformers Can Make One Strong GAN (CVPR 2021) in PyTorch.

RetinaFace: Deep Face Detection Library in TensorFlow for Python

这是一个deeplabv3-plus-pytorch的源码，可以用于训练自己的模型。

Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Machine learning notebooks in different subjects optimized to run in google collaboratory

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

For IBM Quantum Challenge Africa 2021, 9 September (07:00 UTC) - 20 September (23:00 UTC).

CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

Emulation and Feedback Fuzzing of Firmware with Memory Sanitization

A modern pure-Python library for reading PDF files