PyTorch implementation of Densely Connected Time Delay Neural Network

Overview

Densely Connected Time Delay Neural Network

PyTorch implementation of Densely Connected Time Delay Neural Network (D-TDNN) in our paper "Densely Connected Time Delay Neural Network for Speaker Verification" (INTERSPEECH 2020).

What's New ⚠️

  • [2021-02-14] We add an impl option in TimeDelay, now you can choose:

    • 'conv': implement TDNN by F.conv1d.
    • 'linear': implement TDNN by F.unfold and F.linear.

    Check this commit for more information. Note the pre-trained models of 'conv' have not been uploaded yet.

  • [2021-02-04] TDNN (default implementation) in this repo is slower than nn.Conv1d, but we adopted it because:

    • TDNN in this repo was also used to create F-TDNN models that are not perfectly supported by nn.Conv1d (asymmetric paddings).
    • nn.Conv1d(dilation>1, bias=True) is slow in training.

    However, we do not use F-TDNN here, and we always set bias=False in D-TDNN. So, we are considering uploading a new version of TDNN soon (2021-02-14 updated).

  • [2021-02-01] Our new paper is accepted by ICASSP 2021.

    Y.-Q. Yu, S. Zheng, H. Suo, Y. Lei, and W.-J. Li, "CAM: Context-Aware Masking for Robust Speaker Verification"

    CAM outperforms statistics-and-selection (SS) in terms of speed and accuracy.

Pretrained Models

We provide the pretrained models which can be used in many tasks such as:

  • Speaker Verification
  • Speaker-Dependent Speech Separation
  • Multi-Speaker Text-to-Speech
  • Voice Conversion

D-TDNN & D-TDNN-SS

Usage

Data preparation

You can either use Kaldi toolkit:

  • Download VoxCeleb1 test set and unzip it.
  • Place prepare_voxceleb1_test.sh under $kaldi_root/egs/voxceleb/v2 and change the $datadir and $voxceleb1_root in it.
  • Run chmod +x prepare_voxceleb1_test.sh && ./prepare_voxceleb1_test.sh to generate 30-dim MFCCs.
  • Place the trials under $datadir/test_no_sil.

Or checkout the kaldifeat branch if you do not want to install Kaldi.

Test

  • Download the pretrained D-TDNN model and run:
python evaluate.py --root $datadir/test_no_sil --model D-TDNN --checkpoint dtdnn.pth --device cuda

Evaluation

VoxCeleb1-O

Model Emb. Params (M) Loss Backend EER (%) DCF_0.01 DCF_0.001
TDNN 512 4.2 Softmax PLDA 2.34 0.28 0.38
E-TDNN 512 6.1 Softmax PLDA 2.08 0.26 0.41
F-TDNN 512 12.4 Softmax PLDA 1.89 0.21 0.29
D-TDNN 512 2.8 Softmax Cosine 1.81 0.20 0.28
D-TDNN-SS (0) 512 3.0 Softmax Cosine 1.55 0.20 0.30
D-TDNN-SS 512 3.5 Softmax Cosine 1.41 0.19 0.24
D-TDNN-SS 128 3.1 AAM-Softmax Cosine 1.22 0.13 0.20

Citation

If you find D-TDNN helps your research, please cite

@inproceedings{DBLP:conf/interspeech/YuL20,
  author    = {Ya-Qi Yu and
               Wu-Jun Li},
  title     = {Densely Connected Time Delay Neural Network for Speaker Verification},
  booktitle = {Annual Conference of the International Speech Communication Association (INTERSPEECH)},
  pages     = {921--925},
  year      = {2020}
}

Revision of the Paper ⚠️

References:

[16] X. Li, W. Wang, X. Hu, and J. Yang, "Selective Kernel Networks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 510-519.

Comments
  • size mismatch while loading pre-trained weights

    size mismatch while loading pre-trained weights

    RuntimeError: Error(s) in loading state_dict for DTDNN: Missing key(s) in state_dict: "xvector.tdnn.linear.bias", "xvector.dense.linear.bias". size mismatch for xvector.tdnn.linear.weight: copying a param with shape torch.Size([128, 30, 5]) from checkpoint, the shape in current model is torch.Size([128, 150]). size mismatch for xvector.block1.tdnnd1.linear1.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([128, 128]). size mismatch for xvector.block1.tdnnd1.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd2.linear1.weight: copying a param with shape torch.Size([128, 192, 1]) from checkpoint, the shape in current model is torch.Size([128, 192]). size mismatch for xvector.block1.tdnnd2.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd3.linear1.weight: copying a param with shape torch.Size([128, 256, 1]) from checkpoint, the shape in current model is torch.Size([128, 256]). size mismatch for xvector.block1.tdnnd3.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd4.linear1.weight: copying a param with shape torch.Size([128, 320, 1]) from checkpoint, the shape in current model is torch.Size([128, 320]). size mismatch for xvector.block1.tdnnd4.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd5.linear1.weight: copying a param with shape torch.Size([128, 384, 1]) from checkpoint, the shape in current model is torch.Size([128, 384]). size mismatch for xvector.block1.tdnnd5.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd6.linear1.weight: copying a param with shape torch.Size([128, 448, 1]) from checkpoint, the shape in current model is torch.Size([128, 448]). size mismatch for xvector.block1.tdnnd6.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.transit1.linear.weight: copying a param with shape torch.Size([256, 512, 1]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for xvector.block2.tdnnd1.linear1.weight: copying a param with shape torch.Size([128, 256, 1]) from checkpoint, the shape in current model is torch.Size([128, 256]). size mismatch for xvector.block2.tdnnd1.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd2.linear1.weight: copying a param with shape torch.Size([128, 320, 1]) from checkpoint, the shape in current model is torch.Size([128, 320]). size mismatch for xvector.block2.tdnnd2.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd3.linear1.weight: copying a param with shape torch.Size([128, 384, 1]) from checkpoint, the shape in current model is torch.Size([128, 384]). size mismatch for xvector.block2.tdnnd3.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd4.linear1.weight: copying a param with shape torch.Size([128, 448, 1]) from checkpoint, the shape in current model is torch.Size([128, 448]). size mismatch for xvector.block2.tdnnd4.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd5.linear1.weight: copying a param with shape torch.Size([128, 512, 1]) from checkpoint, the shape in current model is torch.Size([128, 512]). size mismatch for xvector.block2.tdnnd5.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd6.linear1.weight: copying a param with shape torch.Size([128, 576, 1]) from checkpoint, the shape in current model is torch.Size([128, 576]). size mismatch for xvector.block2.tdnnd6.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd7.linear1.weight: copying a param with shape torch.Size([128, 640, 1]) from checkpoint, the shape in current model is torch.Size([128, 640]). size mismatch for xvector.block2.tdnnd7.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd8.linear1.weight: copying a param with shape torch.Size([128, 704, 1]) from checkpoint, the shape in current model is torch.Size([128, 704]). size mismatch for xvector.block2.tdnnd8.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd9.linear1.weight: copying a param with shape torch.Size([128, 768, 1]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for xvector.block2.tdnnd9.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd10.linear1.weight: copying a param with shape torch.Size([128, 832, 1]) from checkpoint, the shape in current model is torch.Size([128, 832]). size mismatch for xvector.block2.tdnnd10.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd11.linear1.weight: copying a param with shape torch.Size([128, 896, 1]) from checkpoint, the shape in current model is torch.Size([128, 896]). size mismatch for xvector.block2.tdnnd11.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd12.linear1.weight: copying a param with shape torch.Size([128, 960, 1]) from checkpoint, the shape in current model is torch.Size([128, 960]). size mismatch for xvector.block2.tdnnd12.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.transit2.linear.weight: copying a param with shape torch.Size([512, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for xvector.dense.linear.weight: copying a param with shape torch.Size([512, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024]).

    opened by zabir-nabil 3
  • 实验细节的疑问

    实验细节的疑问

    您好: 我想教下您的论文中,实验的实现细节: 1.实验数据:我看很多其他论文都是使用voxceleb2 dev 5994说话人作为训练集(或者voxceleb dev+voxceleb2 dev,1211+5994说话人),您有只在这部分说话人上的实验结果吗?方便透露下嘛?

    2.PLDA和Cosine Similarity:您这里实验比较这两个的EER在TDNN中是提取的是倒数第二层(分类器前一层)还是第三层(xvector)的输出啊?因为我在论文中又看到,这两个不同层embedding对不同方法性能有差异,倒数第二层的cosine方法可能会更好一些。

    Thanks!🙏

    opened by Wenhao-Yang 1
  • questions about model training

    questions about model training

    hello, yuyq96, Thank you so much for the great work you've shared. I learned that D-TDNNSS mini-batch setting 128 from D-TDNN paper. But this model is too large to train on single gpu. Could you tell me how you train it? Using nn.Parallel or DDP? Looking forward to you reply

    opened by forwiat 2
  • the difference between kaldifeat-kaldi and kaldifeat-python?

    the difference between kaldifeat-kaldi and kaldifeat-python?

    May I ask you the numerical difference between kaldifeat by kaldi implementation and kaldifeat by your python implementation? I have compared the two computed features, and I find it has some difference. I wonder that the experiment results showed in D-TDNN master and D-TDNN-kaldifeat branch is absolutely the same.

    Thanks~

    opened by mezhou 4
  • 针对论文的一些疑问

    针对论文的一些疑问

    您好,我觉得您的工作-DTDNN,在参数比较少的情况下获得了较ETDNN,FTDNN更好的结果,我认为这非常有意义。但是我对论文的实验存在两处疑惑: 1、论文中Table5中,基于softmax训练的D-TDNN模型Cosine的结果好于PLDA,在上面的TDNN,ETDNN,FTDNN的结果不一致(均是PLDA好于Cosine),请问这是什么原因导致的? 2、对于null branch,能稍微解释一下吗?

    opened by xuanjihe 10
Releases(trials)
Owner
Ya-Qi Yu
Machine Learning
Ya-Qi Yu
git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

Beta R-CNN: Looking into Pedestrian Detection from Another Perspective This is the pytorch implementation of our paper "[Beta R-CNN: Looking into Pede

35 Sep 08, 2021
Official implementation for the paper: Generating Smooth Pose Sequences for Diverse Human Motion Prediction

Generating Smooth Pose Sequences for Diverse Human Motion Prediction This is official implementation for the paper Generating Smooth Pose Sequences fo

Wei Mao 28 Dec 10, 2022
DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos A concise deep reinforcement learning libr

329 Jan 03, 2023
CPF: Learning a Contact Potential Field to Model the Hand-object Interaction

Contact Potential Field This repo contains model, demo, and test codes of our paper: CPF: Learning a Contact Potential Field to Model the Hand-object

Lixin YANG 99 Dec 26, 2022
An index of algorithms for learning causality with data

awesome-causality-algorithms An index of algorithms for learning causality with data. Please cite our survey paper if this index is helpful. @article{

Ruocheng Guo 2.3k Jan 08, 2023
Dense Gaussian Processes for Few-Shot Segmentation

DGPNet - Dense Gaussian Processes for Few-Shot Segmentation Welcome to the public repository for DGPNet. The paper is available at arxiv: https://arxi

37 Jan 07, 2023
1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.

2021AICompetition-03 본 repo 는 mAy-I Inc. 팀으로 참가한 2021 인공지능 온라인 경진대회 중 [이미지] 운전 사고 예방을 위한 운전자 부주의 행동 검출 모델] 태스크 수행을 위한 레포지토리입니다. mAy-I 는 과학기술정보통신부가 주최하

Junhyuk Park 9 Dec 01, 2022
State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

Fredrik Carlsson 88 Dec 30, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
Winners of DrivenData's Overhead Geopose Challenge

Winners of DrivenData's Overhead Geopose Challenge

DrivenData 22 Aug 04, 2022
Tom-the-AI - A compound artificial intelligence software for Linux systems.

Tom the AI (version 0.82) WARNING: This software is not yet ready to use, I'm still setting up the GitHub repository. Should be ready in a few days. T

2 Apr 28, 2022
Cancer-and-Tumor-Detection-Using-Inception-model - In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks, specifically here the Inception model by google.

Cancer-and-Tumor-Detection-Using-Inception-model In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks

Deepak Nandwani 1 Jan 01, 2022
A Python implementation of global optimization with gaussian processes.

Bayesian Optimization Pure Python implementation of bayesian global optimization with gaussian processes. PyPI (pip): $ pip install bayesian-optimizat

fernando 6.5k Jan 02, 2023
HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton

HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton Wencan Cheng, Jae Hyun Park, Jong

cwc1260 23 Oct 21, 2022
Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co

Wenxuan Zhou 146 Nov 29, 2022
Project Tugas Besar pertama Pengenalan Komputasi Institut Teknologi Bandung

Vending_Machine_(Mesin_Penjual_Minuman) Project Tugas Besar pertama Pengenalan Komputasi Institut Teknologi Bandung Raw Sketch untuk Essay Ringkasan P

QueenLy 1 Nov 08, 2021
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

184 Dec 11, 2022
HGCAE Pytorch implementation. CVPR2021 accepted.

Hyperbolic Graph Convolutional Auto-Encoders Accepted to CVPR2021 🎉 Official PyTorch code of Unsupervised Hyperbolic Representation Learning via Mess

Junho Cho 37 Nov 13, 2022
An official implementation of the Anchor DETR.

Anchor DETR: Query Design for Transformer-Based Detector Introduction This repository is an official implementation of the Anchor DETR. We encode the

MEGVII Research 276 Dec 28, 2022
LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

LiDAR Distillation Paper | Model LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection Yi Wei, Zibu Wei, Yongming Rao, Jiax

Yi Wei 75 Dec 22, 2022