Multistream Convolutional Neural Network (CNN)

A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recognition tasks. It processes input speech with diverse resolutions by applying different dilation rates to convolutional neural networks across multiple streams to achieve the robustness. The dilation rate of 3 are selected from the multiples of a sub-sampling rate of 3 frames. Each stream stacks TDNN-F layers (a variant of 1D CNN), and output embedding vectors from the streams are concatenated then projected to the final layer, as illustrated below:

References

Multistream CNN for Robust Acoustic Modeling [paper]

{
  @inproceedings{han2021multistream-cnn,
    title={Multistream CNN for Robust Acoustic Modeling},
    author={Kyu J. Han and Jing Pan and Venkata Krishna Naveen Tadala and Tao Ma and Dan Povey},
    booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    year={2021}
}

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition [paper]

{
  @inproceedings{pan2020asapp-asr,
    title={ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition},
    author={Jing Pan and Joshua Shapiro and Jeremy Wohlwend and Kyu J. Han and Tao Lei and Tao Ma},
    booktitle={Interspeech},
    year={2020}
}

Installation

Please follow the original Kaldi build sequence, as below.

>> cd tools; make; cd ../src; ./configure; make clean; make -j clean depend; make -j all

Recipes and Results

LibriSpeech

>> egs/librispeech/s5/local/chain/run_multistream_cnn_1a.sh

	dev-clean	dev-other	test-clean	test-other
tdnn_1d	3.29	8.71	3.80	8.76
multistream_cnn_1a	3.20	7.68	3.54	7.87

Fisher-SWBD

>> egs/fisher_swbd/s5/local/chain/run_multistream_cnn_1a.sh

	eval2000	swbd	callhm
tdnn_7d	12.6	8.8	16.3
multistream_cnn_1a	12.6	9.2	15.7

Multistream CNN for Robust Acoustic Modeling

Related tags

Overview

Multistream Convolutional Neural Network (CNN)

References

Installation

Recipes and Results

Owner

ASAPP Research

NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem

A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

Pytorch implementation of AREL

Bayes-Newton—A Gaussian process library in JAX, with a unifying view of approximate Bayesian inference as variants of Newton's algorithm.

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

Learning Time-Critical Responses for Interactive Character Control

ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

pq is a jq-like Pickle file viewer

This repository accompanies the ACM TOIS paper "What can I cook with these ingredients?" - Understanding cooking-related information needs in conversational search

AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

A package to predict protein inter-residue geometries from sequence data

Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

Hierarchical Uniform Manifold Approximation and Projection

Annotate with anyone, anywhere.

Demo notebooks for Qiskit application modules demo sessions (Oct 8 & 15):

The official implementation of the research paper "DAG Amendment for Inverse Control of Parametric Shapes"

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition