Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Last update: Nov 07, 2022

Related tags

Text Data & NLP SpeechMix

Overview

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Related tags

Overview

SpeechMix

Introduction

Speech encoder NLP decoder

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

Speech encoder NLP encoder decoder

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

Speech encoder NLP encoder decoder only fine-tune on speech encoder

Installation

pip install

Build from source

Owner

Eric Lam

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

InferSent sentence embeddings

BeautyNet is an AI powered model which can tell you whether you're beautiful or not.

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Sentence Embeddings with BERT & XLNet

Fast, general, and tested differentiable structured prediction in PyTorch

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

leaking paid token generator that was a shit lmao for 100$ haha

A high-level yet extensible library for fast language model tuning via automatic prompt search

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Prompt tuning toolkit for GPT-2 and GPT-Neo

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.