Speech Rankings

This project mimics CSRankings to generate an ordered list of researchers in speech/spoken language processing along with their possible research topics, based on recent publications on important venues of the field, so as to help students seeking for PhD studies to find desirable advisors.

How to use

The pre-generated report is available at here. To build it by yourself,

Run prepare_data.py to build publications.json and authors.json, or simply use the data provided, covering those from 2011 to 2021.
Run export.py to generate the report.

How does it work

We scrape author metadata and publication data of the following three types of venues from DBLP, including:

Speech venues: Interspeech, Speech Communications, SLT, SSW, ASRU, IWSLT
Mixed venues: ICASSP, TASLP
General venues: NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, KDD, AAAI, IJCAI

All publications in Speech venues are included. Paricularly for Interspeech, section/field of each paper are collected from ISCA Archive to show possible research topics of each researcher. So are the keywords from IEEE Xplore for papers published on IEEE-held venues. Keywords (as well as titles) are also used to filter out non-speech papers in Mixed venues by a set of rules. Titles are used to identify speech papers in General venues. Researchers are sorted by the total number of publications.

The collected data contain errors, and the project is neither intended to index speech-related papers nor to compare researchers in the field.

A CSRankings-like index for speech researchers

Related tags

Overview

Speech Rankings

How to use

How does it work

Owner

Mutian He

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

A Python script that compares files in directories

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Subtitle Workshop (subshop): tools to download and synchronize subtitles

The SVO-Probes Dataset for Verb Understanding

Protein Language Model

A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

A Chinese to English Neural Model Translation Project

edge-SR: Super-Resolution For The Masses

Official PyTorch implementation of Time-aware Large Kernel (TaLK) Convolutions (ICML 2020)

Task-based datasets, preprocessing, and evaluation for sequence models.

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

Natural Language Processing with transformers

Binary LSTM model for text classification

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

MiCECo - Misskey Custom Emoji Counter