Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Last update: Jul 08, 2022

Related tags

Deep Learning AuxiliaryRawNet

Overview

This repository is an implementation of the Auxiliary Raw Net (ARawNet), which is ASVSpoof detection system taking both raw waveform and handcrafted features as inputs,to balance the trade-off between performance and model complexity. The paper can be checked here.

The model performance is tested on the ASVSpoof 2019 Dataset.

Setup

Environment

Show details

speechbrain==0.5.7
pandas
torch==1.9.1
torchaudio==0.9.1
nnAudio==0.2.6
ptflops==0.6.6

Create a conda environment with conda env create -f environment.yml.
Activate the conda environment with conda activate .

Data preprocessing

.
├── data                       
│   │
│   ├── PA                  
│   │   └── ...
│   └── LA           
│       ├── ASVspoof2019_LA_asv_protocols
│       ├── ASVspoof2019_LA_asv_scores
│       ├── ASVspoof2019_LA_cm_protocols
│       ├── ASVspoof2019_LA_train
│       ├── ASVspoof2019_LA_dev
│       
│
└── ARawNet

Download dataset. Our experiment is trained on the Logical access (LA) scenario of the ASVspoof 2019 dataset. Dataset can be downloaded here.
Unzip and save the data to a folder data in the same directory as ARawNet as shown in below.
Run python preprocess.py Or you can use our processed data directly under "/processed_data".

Train

python train_raw_net.py yaml/RawSNet.yaml --data_parallel_backend -data_parallel_count=2

Evaluate

python eval.py

Check Model Size and multiply-and-accumulates (MACs)

python check_model_size.py yaml/RawSNet.yaml

Model Performance

Accuracy metric

min t−DCF =min{βPcm (s)+Pcm(s)}

Explanations can be found here: t-DCF

Experiment Results

	Front-end	Main Encoder	E_A	EER	min-tDCF
Res2Net	Spec	Res2Net	-	8.783	0.2237
	LFCC		-	2.869	0.0786
	CQT		-	2.502	0.0743
Rawnet2	Raw waveforms	Rawnet2	-	5.13	0.1175
ARawNet	Mel-Spectrogram	XVector	✅	1.32	0.03894
			-	2.39320	0.06875
ARawNet	Mel-Spectrogram	ECAPA-TDNN	✅	1.39	0.04316
			-	2.11	0.06425
ARawNet	CQT	XVector	✅	1.74	0.05194
			-	3.39875	0.09510
ARawNet	CQT	ECAPA-TDNN	✅	1.11	0.03645
			-	1.72667	0.05077

Main Encoder	Auxiliary Encoder	Parameters	MACs
Rawnet2	-	25.43 M	7.61 GMac
Res2Net	-	0.92 M	1.11 GMac
XVector	✅	5.81 M	2.71 GMac
XVector	-	4.66M	1.88 GMac
ECAPA-TDNN	✅	7.18 M	3.19 GMac
ECAPA-TDNN	-	6.03M	2.36 GMac

Cite Our Paper

If you use this repository, please consider citing:

@inproceedings{Teng2021ComplementingHF, title={Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model}, author={Zhongwei Teng and Quchen Fu and Jules White and M. Powell and Douglas C. Schmidt}, year={2021} }

@inproceedings{Fu2021FastAudioAL, title={FastAudio: A Learnable Audio Front-End for Spoof Speech Detection}, author={Quchen Fu and Zhongwei Teng and Jules White and M. Powell and Douglas C. Schmidt}, year={2021} }

Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Related tags

Overview

Overview

Setup

Environment

Data preprocessing

Train

Evaluate

Check Model Size and multiply-and-accumulates (MACs)

Model Performance

Accuracy metric

Experiment Results

Cite Our Paper

Owner

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

Code for "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection", ICRA 2021

Temporally Coherent GAN SIGGRAPH project.

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

UnsupervisedR&R: Unsupervised Pointcloud Registration via Differentiable Rendering

This is a Deep Leaning API for classifying emotions from human face and human audios.

Beyond imagenet attack (accepted by ICLR 2022) towards crafting adversarial examples for black-box domains.

Understanding Convolutional Neural Networks from Theoretical Perspective via Volterra Convolution

The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Companion code for the paper "Meta-Learning the Search Distribution of Black-Box Random Search Based Adversarial Attacks" by Yatsura et al.

Colour detection is necessary to recognize objects, it is also used as a tool in various image editing and drawing apps.

Oscar and VinVL

Machine Translation Implement By Bi-GRU And Transformer

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

A Light CNN for Deep Face Representation with Noisy Labels

This project is based on our SIGGRAPH 2021 paper, ROSEFusion: Random Optimization for Online DenSE Reconstruction under Fast Camera Motion .

This repository contains all data used for writing a research paper Multiple Object Trackers in OpenCV: A Benchmark, presented in ISIE 2021 conference in Kyoto, Japan.

Apollo optimizer in tensorflow