LeBenchmark: a reproducible framework for assessing SSL from speech

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This renders difficult the objective comparison between SSL approaches and the evaluation of their impact on building speech systems.

In this repository, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. Also, it targets speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets.

The scripts for data preparation are available here.

Our pre-trained SSL models for French are available through this HuggingFace link: https://huggingface.co/LeBenchmark

Our benchmark tasks are available on the following directories:

ASR: Automatic Speech Recognition

SLU: Spoken Language Understanding

AER: Automatic Emotion Recognition

AST: Automatic Speech Translation

Detailed descriptions of experiments and results are given in on our paper: TBC !

LeBenchmark: a reproducible framework for assessing SSL from speech

Related tags

Overview

LeBenchmark: a reproducible framework for assessing SSL from speech

Owner

A simple Streamlit App to classify swahili news into different categories.

JaQuAD: Japanese Question Answering Dataset

Pytorch-Named-Entity-Recognition-with-BERT

Client library to download and publish models and other files on the huggingface.co hub

Auto translate textbox from Japanese to English or Indonesia

a CTF web challenge about making screenshots

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

Yet another Python binding for fastText

OCR을 이용하여 인원수를 인식 후 줌을 Kill 해줍니다

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

100+ Chinese Word Vectors 上百种预训练中文词向量

Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

An Open-Source Package for Neural Relation Extraction (NRE)

Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Pytorch version of BERT-whitening

A Practitioner's Guide to Natural Language Processing

Basic Utilities for PyTorch Natural Language Processing (NLP)

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Finally, some decent sample sentences