The aim of this task is to predict someone's English proficiency based on a text input.

Last update: Dec 13, 2021

Overview

English_proficiency_prediction_NLP

The aim of this task is to predict someone's English proficiency based on a text input.

Using the The NICT JLE Corpus available here : https://alaginrc.nict.go.jp/nict_jle/index_E.html

The source of the corpus data is the transcripts of the audio-recorded speech samples of 1,281 participants (1.2 million words, 300 hours in total) of English oral proficiency interview test. Each participant got a SST (Standard Speaking Test) score between 1 (low proficiency) and 9 (high proficiency) based on this test.

The goal is to build a machine learning algorithm for predicting the SST score of each participant based on their transcript.

Steps:

1 - Pre-process the dataset: extract the participant transcript (all tags). Inside participant transcript, you can remove all other tags and extract only English words.

2 - Process the dataset: extract features with the Bag of Word (BoW) technique

3 - Train a classifier to predict the SST score

4 - Compute the accuracy of your system (the number of participant classified correctly) and plot the confusion matrix.

5 - Try to improve your system (for example you can try to use GloVe instead of BoW).

The aim of this task is to predict someone's English proficiency based on a text input.

Related tags

Overview

English_proficiency_prediction_NLP

Owner

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

BeautyNet is an AI powered model which can tell you whether you're beautiful or not.

PyTorch impelementations of BERT-based Spelling Error Correction Models.

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

Global Rhythm Style Transfer Without Text Transcriptions

The PyTorch based implementation of continuous integrate-and-fire (CIF) module.

CCF BDCI 2020 房产行业聊天问答匹配赛道 A榜47/2985

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Transformers and related deep network architectures are summarized and implemented here.

🎐 a python library for doing approximate and phonetic matching of strings.

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Retraining OpenAI's GPT-2 on Discord Chats