PyTorch implementation of Densely Connected Time Delay Neural Network

Overview

Densely Connected Time Delay Neural Network

PyTorch implementation of Densely Connected Time Delay Neural Network (D-TDNN) in our paper "Densely Connected Time Delay Neural Network for Speaker Verification" (INTERSPEECH 2020).

What's New ⚠️

  • [2021-02-14] We add an impl option in TimeDelay, now you can choose:

    • 'conv': implement TDNN by F.conv1d.
    • 'linear': implement TDNN by F.unfold and F.linear.

    Check this commit for more information. Note the pre-trained models of 'conv' have not been uploaded yet.

  • [2021-02-04] TDNN (default implementation) in this repo is slower than nn.Conv1d, but we adopted it because:

    • TDNN in this repo was also used to create F-TDNN models that are not perfectly supported by nn.Conv1d (asymmetric paddings).
    • nn.Conv1d(dilation>1, bias=True) is slow in training.

    However, we do not use F-TDNN here, and we always set bias=False in D-TDNN. So, we are considering uploading a new version of TDNN soon (2021-02-14 updated).

  • [2021-02-01] Our new paper is accepted by ICASSP 2021.

    Y.-Q. Yu, S. Zheng, H. Suo, Y. Lei, and W.-J. Li, "CAM: Context-Aware Masking for Robust Speaker Verification"

    CAM outperforms statistics-and-selection (SS) in terms of speed and accuracy.

Pretrained Models

We provide the pretrained models which can be used in many tasks such as:

  • Speaker Verification
  • Speaker-Dependent Speech Separation
  • Multi-Speaker Text-to-Speech
  • Voice Conversion

D-TDNN & D-TDNN-SS

Usage

Data preparation

You can either use Kaldi toolkit:

  • Download VoxCeleb1 test set and unzip it.
  • Place prepare_voxceleb1_test.sh under $kaldi_root/egs/voxceleb/v2 and change the $datadir and $voxceleb1_root in it.
  • Run chmod +x prepare_voxceleb1_test.sh && ./prepare_voxceleb1_test.sh to generate 30-dim MFCCs.
  • Place the trials under $datadir/test_no_sil.

Or checkout the kaldifeat branch if you do not want to install Kaldi.

Test

  • Download the pretrained D-TDNN model and run:
python evaluate.py --root $datadir/test_no_sil --model D-TDNN --checkpoint dtdnn.pth --device cuda

Evaluation

VoxCeleb1-O

Model Emb. Params (M) Loss Backend EER (%) DCF_0.01 DCF_0.001
TDNN 512 4.2 Softmax PLDA 2.34 0.28 0.38
E-TDNN 512 6.1 Softmax PLDA 2.08 0.26 0.41
F-TDNN 512 12.4 Softmax PLDA 1.89 0.21 0.29
D-TDNN 512 2.8 Softmax Cosine 1.81 0.20 0.28
D-TDNN-SS (0) 512 3.0 Softmax Cosine 1.55 0.20 0.30
D-TDNN-SS 512 3.5 Softmax Cosine 1.41 0.19 0.24
D-TDNN-SS 128 3.1 AAM-Softmax Cosine 1.22 0.13 0.20

Citation

If you find D-TDNN helps your research, please cite

@inproceedings{DBLP:conf/interspeech/YuL20,
  author    = {Ya-Qi Yu and
               Wu-Jun Li},
  title     = {Densely Connected Time Delay Neural Network for Speaker Verification},
  booktitle = {Annual Conference of the International Speech Communication Association (INTERSPEECH)},
  pages     = {921--925},
  year      = {2020}
}

Revision of the Paper ⚠️

References:

[16] X. Li, W. Wang, X. Hu, and J. Yang, "Selective Kernel Networks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 510-519.

Comments
  • size mismatch while loading pre-trained weights

    size mismatch while loading pre-trained weights

    RuntimeError: Error(s) in loading state_dict for DTDNN: Missing key(s) in state_dict: "xvector.tdnn.linear.bias", "xvector.dense.linear.bias". size mismatch for xvector.tdnn.linear.weight: copying a param with shape torch.Size([128, 30, 5]) from checkpoint, the shape in current model is torch.Size([128, 150]). size mismatch for xvector.block1.tdnnd1.linear1.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([128, 128]). size mismatch for xvector.block1.tdnnd1.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd2.linear1.weight: copying a param with shape torch.Size([128, 192, 1]) from checkpoint, the shape in current model is torch.Size([128, 192]). size mismatch for xvector.block1.tdnnd2.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd3.linear1.weight: copying a param with shape torch.Size([128, 256, 1]) from checkpoint, the shape in current model is torch.Size([128, 256]). size mismatch for xvector.block1.tdnnd3.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd4.linear1.weight: copying a param with shape torch.Size([128, 320, 1]) from checkpoint, the shape in current model is torch.Size([128, 320]). size mismatch for xvector.block1.tdnnd4.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd5.linear1.weight: copying a param with shape torch.Size([128, 384, 1]) from checkpoint, the shape in current model is torch.Size([128, 384]). size mismatch for xvector.block1.tdnnd5.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd6.linear1.weight: copying a param with shape torch.Size([128, 448, 1]) from checkpoint, the shape in current model is torch.Size([128, 448]). size mismatch for xvector.block1.tdnnd6.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.transit1.linear.weight: copying a param with shape torch.Size([256, 512, 1]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for xvector.block2.tdnnd1.linear1.weight: copying a param with shape torch.Size([128, 256, 1]) from checkpoint, the shape in current model is torch.Size([128, 256]). size mismatch for xvector.block2.tdnnd1.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd2.linear1.weight: copying a param with shape torch.Size([128, 320, 1]) from checkpoint, the shape in current model is torch.Size([128, 320]). size mismatch for xvector.block2.tdnnd2.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd3.linear1.weight: copying a param with shape torch.Size([128, 384, 1]) from checkpoint, the shape in current model is torch.Size([128, 384]). size mismatch for xvector.block2.tdnnd3.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd4.linear1.weight: copying a param with shape torch.Size([128, 448, 1]) from checkpoint, the shape in current model is torch.Size([128, 448]). size mismatch for xvector.block2.tdnnd4.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd5.linear1.weight: copying a param with shape torch.Size([128, 512, 1]) from checkpoint, the shape in current model is torch.Size([128, 512]). size mismatch for xvector.block2.tdnnd5.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd6.linear1.weight: copying a param with shape torch.Size([128, 576, 1]) from checkpoint, the shape in current model is torch.Size([128, 576]). size mismatch for xvector.block2.tdnnd6.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd7.linear1.weight: copying a param with shape torch.Size([128, 640, 1]) from checkpoint, the shape in current model is torch.Size([128, 640]). size mismatch for xvector.block2.tdnnd7.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd8.linear1.weight: copying a param with shape torch.Size([128, 704, 1]) from checkpoint, the shape in current model is torch.Size([128, 704]). size mismatch for xvector.block2.tdnnd8.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd9.linear1.weight: copying a param with shape torch.Size([128, 768, 1]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for xvector.block2.tdnnd9.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd10.linear1.weight: copying a param with shape torch.Size([128, 832, 1]) from checkpoint, the shape in current model is torch.Size([128, 832]). size mismatch for xvector.block2.tdnnd10.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd11.linear1.weight: copying a param with shape torch.Size([128, 896, 1]) from checkpoint, the shape in current model is torch.Size([128, 896]). size mismatch for xvector.block2.tdnnd11.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd12.linear1.weight: copying a param with shape torch.Size([128, 960, 1]) from checkpoint, the shape in current model is torch.Size([128, 960]). size mismatch for xvector.block2.tdnnd12.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.transit2.linear.weight: copying a param with shape torch.Size([512, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for xvector.dense.linear.weight: copying a param with shape torch.Size([512, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024]).

    opened by zabir-nabil 3
  • 实验细节的疑问

    实验细节的疑问

    您好: 我想教下您的论文中,实验的实现细节: 1.实验数据:我看很多其他论文都是使用voxceleb2 dev 5994说话人作为训练集(或者voxceleb dev+voxceleb2 dev,1211+5994说话人),您有只在这部分说话人上的实验结果吗?方便透露下嘛?

    2.PLDA和Cosine Similarity:您这里实验比较这两个的EER在TDNN中是提取的是倒数第二层(分类器前一层)还是第三层(xvector)的输出啊?因为我在论文中又看到,这两个不同层embedding对不同方法性能有差异,倒数第二层的cosine方法可能会更好一些。

    Thanks!🙏

    opened by Wenhao-Yang 1
  • questions about model training

    questions about model training

    hello, yuyq96, Thank you so much for the great work you've shared. I learned that D-TDNNSS mini-batch setting 128 from D-TDNN paper. But this model is too large to train on single gpu. Could you tell me how you train it? Using nn.Parallel or DDP? Looking forward to you reply

    opened by forwiat 2
  • the difference between kaldifeat-kaldi and kaldifeat-python?

    the difference between kaldifeat-kaldi and kaldifeat-python?

    May I ask you the numerical difference between kaldifeat by kaldi implementation and kaldifeat by your python implementation? I have compared the two computed features, and I find it has some difference. I wonder that the experiment results showed in D-TDNN master and D-TDNN-kaldifeat branch is absolutely the same.

    Thanks~

    opened by mezhou 4
  • 针对论文的一些疑问

    针对论文的一些疑问

    您好,我觉得您的工作-DTDNN,在参数比较少的情况下获得了较ETDNN,FTDNN更好的结果,我认为这非常有意义。但是我对论文的实验存在两处疑惑: 1、论文中Table5中,基于softmax训练的D-TDNN模型Cosine的结果好于PLDA,在上面的TDNN,ETDNN,FTDNN的结果不一致(均是PLDA好于Cosine),请问这是什么原因导致的? 2、对于null branch,能稍微解释一下吗?

    opened by xuanjihe 10
Releases(trials)
Owner
Ya-Qi Yu
Machine Learning
Ya-Qi Yu
Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth

Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth This codebase implements the loss function described in: Insta

209 Dec 07, 2022
Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Refer-it-in-RGBD This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021 Pape

Haolin Liu 34 Nov 07, 2022
RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids

RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids Real-time detection performance. This repo contains the code an

0 Nov 10, 2021
Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

77 Dec 24, 2022
Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

AutoViz and Auto_ViML 397 Dec 30, 2022
Bayesian Generative Adversarial Networks in Tensorflow

Bayesian Generative Adversarial Networks in Tensorflow This repository contains the Tensorflow implementation of the Bayesian GAN by Yunus Saatchi and

Andrew Gordon Wilson 1k Nov 29, 2022
《Deep Single Portrait Image Relighting》(ICCV 2019)

Ratio Image Based Rendering for Deep Single-Image Portrait Relighting [Project Page] This is part of the Deep Portrait Relighting project. If you find

62 Dec 21, 2022
Code for our paper A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization,

FSRA This repository contains the dataset link and the code for our paper A Transformer-Based Feature Segmentation and Region Alignment Method For UAV

Dmmm 32 Dec 18, 2022
OneShot Learning-based hotword detection.

EfficientWord-Net Hotword detection based on one-shot learning Home assistants require special phrases called hotwords to get activated (eg:"ok google

ANT-BRaiN 102 Dec 25, 2022
A tiny, pedagogical neural network library with a pytorch-like API.

candl A tiny, pedagogical implementation of a neural network library with a pytorch-like API. The primary use of this library is for education. Use th

Sri Pranav 3 May 23, 2022
Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

PlantDoc: A Dataset for Visual Plant Disease Detection This repository contains the Cropped-PlantDoc dataset used for benchmarking classification mode

Pratik Kayal 109 Dec 29, 2022
Efficient Two-Step Networks for Temporal Action Segmentation (Neurocomputing 2021)

Efficient Two-Step Networks for Temporal Action Segmentation This repository provides a PyTorch implementation of the paper Efficient Two-Step Network

8 Apr 16, 2022
Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

Drone Detection using Thermal Signature This repository highlights the work for night-time drone detection using a using an Optris PI Lightweight ther

Chong Yu Quan 6 Dec 31, 2022
Repo for the Tutorials of Day1-Day3 of the Nordic Probabilistic AI School 2021 (https://probabilistic.ai/)

ProbAI 2021 - Probabilistic Programming and Variational Inference Tutorial with Pryo Day 1 (June 14) Slides Notebook: students_PPLs_Intro Notebook: so

PGM-Lab 46 Nov 01, 2022
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers Results results on COCO val Backbone Method Lr Schd PQ Config Download

155 Dec 20, 2022
implement of SwiftNet:Real-time Video Object Segmentation

SwiftNet The official PyTorch implementation of SwiftNet:Real-time Video Object Segmentation, which has been accepted by CVPR2021. Requirements Python

haochen wang 64 Dec 14, 2022
Non-Attentive-Tacotron - This is Pytorch Implementation of Google's Non-attentive Tacotron.

Non-attentive Tacotron - PyTorch Implementation This is Pytorch Implementation of Google's Non-attentive Tacotron, text-to-speech system. There is som

Jounghee Kim 46 Dec 19, 2022
Experiments with differentiable stacks and queues in PyTorch

Please use stacknn-core instead! StackNN This project implements differentiable stacks and queues in PyTorch. The data structures are implemented in s

Will Merrill 141 Oct 06, 2022
Code and data (Incidents Dataset) for ECCV 2020 Paper "Detecting natural disasters, damage, and incidents in the wild".

Incidents Dataset See the following pages for more details: Project page: IncidentsDataset.csail.mit.edu. ECCV 2020 Paper "Detecting natural disasters

Ethan Weber 67 Dec 27, 2022
MADT: Offline Pre-trained Multi-Agent Decision Transformer

MADT: Offline Pre-trained Multi-Agent Decision Transformer A link to our paper can be found on Arxiv. Overview Official codebase for Offline Pre-train

Linghui Meng 51 Dec 21, 2022