The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

Last update: Dec 08, 2022

Overview

M2MeT challenge baseline -- AliMeeting

This project provides the baseline system recipes for the ICASSP 2020 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). The challenge mainly consists of two tracks, named Automatic Speech Recognition (ASR) and Speaker Diarization. For each track, detailed descriptions can be found in its corresponding directory. The goal of this project is to simplify the training and evaluation procedures and make it flexible for participants to reproduce the baseline experiments and develop novelty methods.

Setup

git clone https://github.com/yufan-aslp/AliMeeting.git

Introduction

Speech Recognition Track: Follow the detailed steps in ./asr.
Speaker Diarization Track: Follow the detailed steps in ./speaker.

General steps

Prepare the training data for speaker diarization and ASR model, respectively
Follow the running steps of the speaker diarization experiment and obtain the rttm file. The rttm file includes the voice activity detection (VAD) and speaker diarization results, which will be used to compute the final Diarization Error Rate (DER) scores.
For ASR track, we can train the single-speaker or multi-speaker ASR models. The evaluation metric of ASR systems is Character Error Rate (CER).

Citation

If you use the challenge dataset or our baseline systems, please consider citing the following:

@article{yu2021m2met,
title={M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge},
author={Yu, Fan and Zhang, Shiliang and Fu, Yihui and Xie, Lei and Zheng, Siqi and Du, Zhihao and Huang, Weilong and Guo, Pengcheng and Yan, Zhijie and Ma, Bin and others},
journal={arXiv preprint arXiv:2110.07393},
year={2021}
}

Our paper is available at https://arxiv.org/abs/2110.07393

The data download method will be sent to registered challenge participants via email.

Organizing Committee

Lei Xie, AISHELL Foundation, China, [email protected]
Bin Ma, Principal Engineer at Alibaba, Singapore, [email protected]
DeLiang Wang, Professor, Ohio State University, USA, [email protected]
Zheng-Hua Tan, Professor, Aalborg University, Denmark, [email protected]
Kong Aik Lee, Senior Scientist, Institute for Infocomm Research, A*STAR, Singapore, [email protected]
Zhijie Yan, Director of Speech Lab at Alibaba, China, [email protected]
Yanmin Qian, Associate Professor, Shanghai Jiao Tong University, China, [email protected]
Hui Bu, CEO, AIShell Inc., China, [email protected]

Contributors

Code license

Apache 2.0

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

Related tags

Overview

M2MeT challenge baseline -- AliMeeting

Setup

Introduction

General steps

Citation

Organizing Committee

Contributors

Code license

Owner

yufan

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

Rethinking Portrait Matting with Privacy Preserving

PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)

TumorInsight is a Brain Tumor Detection and Classification model built using RESNET50 architecture.

Semantic segmentation models, datasets and losses implemented in PyTorch.

Python parser for DTED data.

implementation for paper "ShelfNet for fast semantic segmentation"

Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS)

Management Dashboard for Torchserve

Python scripts for performing lane detection using the LSTR model in ONNX

Existing Literature about Machine Unlearning

Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

This repository contains the official MATLAB implementation of the TDA method for reverse image filtering

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

Code for binary and multiclass model change active learning, with spectral truncation implementation.

Official Implementation of VAT

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

Related tags

Overview

M2MeT challenge baseline -- AliMeeting

Setup

Introduction

General steps

Citation

Organizing Committee

Contributors

Code license

Owner

yufan

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

Rethinking Portrait Matting with Privacy Preserving

PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)

TumorInsight is a Brain Tumor Detection and Classification model built using RESNET50 architecture.

Semantic segmentation models, datasets and losses implemented in PyTorch.

Python parser for DTED data.

implementation for paper "ShelfNet for fast semantic segmentation"

Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS)

Management Dashboard for Torchserve

Python scripts for performing lane detection using the LSTR model in ONNX

Existing Literature about Machine Unlearning

Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

This repository contains the official MATLAB implementation of the TDA method for reverse image filtering

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking

Some tentative models that incorporate label propagation to graph neural networks for graph representation learning in nodes, links or graphs.

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

Code for binary and multiclass model change active learning, with spectral truncation implementation.

Official Implementation of VAT

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.