Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Last update: Apr 30, 2022

Overview

Speaker-Embeddings-Correlation-Pooling

This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations" by T. Stafylakis, J. Rohdin, and L. Burget (Interspeech 2021), a result of the collaboration between Omilia - Conversational Intelligence and Brno University of Technology (BUT), which you may find here.

The code is in TensorFlow1 (TF1) but it should work with TF2 too. I only provide the code for creating the network and the required hyperparameters. The training hyperparameters we used can be found in the paper.

The code is well-commented, at least the part and (hyper-)parameters required for the correlation pooling.

Apart from the experiments provided in the paper, the code allows the user to: (a) Combine standard statistics pooling with correlation pooling, by concatenating the two pooling layers into a single one, and (b) Extract correlation pooling from outputs of all 4 internal ResNet blocks (aka stages) and concatenate them in the pooling layer.

The code can be more efficiently written using tensor-only operators. However, to facilitate research we have implemented it using lists of tensors, e.g. after merging frequency bins to frequency ranges. Despite this inefficiency, we observe no differences between correlation pooling and standard stats pooling in training speed.

Start with the file train_resnet.py, which creates the ResNet (with the pooling mechanism) and sets its parameters. All parameters are set so that you reproduce our best performing experiment (P7 in the paper).

So, try it and let us know what you'll get! Themos

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Related tags

Overview

Speaker-Embeddings-Correlation-Pooling

Owner

Themos Stafylakis

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

Code examples for my Write Better Python Code series on YouTube.

Image2pcl - Enter the metaverse with 2D image to 3D projections

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Implementaion of our ACL 2022 paper Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Switch spaces for knowledge graph embeddings

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Correctly generate plurals, ordinals, indefinite articles; convert numbers to words

An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Intent parsing and slot filling in PyTorch with seq2seq + attention

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

Get list of common stop words in various languages in Python

File-based TF-IDF: Calculates keywords in a document, using a word corpus.