Modified GPT using average pooling to reduce the softmax attention memory constraints.

Last update: Dec 03, 2021

Overview

NLP-GPT-Upsampling

This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Nystromformer implementation to approximate the full attention softmax matrix to model longer sequences in NLP language modeling tasks by a simple strided average pooling of the input text sequence to reduce the sequence length. The reduced length attention output is then upsampled back to the original sequence length using the bilinear method.

It should be noted that due to the simplicity of this implementation, the performance of the model will not be comparable to the original GPT model utilising the full attention matrix. The tradeoff is that this naive strided averaging would be able to model longer sequences as compared to the original GPT implementation.

Fig. 1: GPT Model Architecture (obtained from GPT paper)

Data

This repository includes codes to process the Movie Dialogue dataset, where the preparation of the data follows this script closely, as well as the Reddit Jokes dataset.

To prepare the data prior to training the model(s), run

python process_movie_dialogue_subword.py

for the Movie Dialogue dataset, or

python process_reddit_jokes_subword_v1.py

for the Reddit Jokes dataset.

Training and Model Inference

Having processed the data into sub-word tokens, run

python train_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py
python infer_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py

python train_reddit_jokes_sw_tf_ver2_gpt_keras_upsampled.py
python infer_reddit_jokes_sw_tf_ver2_gpt_keras_upsampled.py

to train the respective models based on the dataset loaded and perform inference of the trained model.

Modified GPT using average pooling to reduce the softmax attention memory constraints.

Related tags

Overview

NLP-GPT-Upsampling

Data

Training and Model Inference

Owner

WD

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Code for the paper "Are Sixteen Heads Really Better than One?"

Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

ChessCoach is a neural network-based chess engine capable of natural-language commentary.

EdiTTS: Score-based Editing for Controllable Text-to-Speech

RuCLIP-SB (Russian Contrastive Language–Image Pretraining SWIN-BERT) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. Unlike other versions of the model we use BERT for text encoder and SWIN transformer for image encoder.

This is a project of data parallel that running on NLP tasks.

A natural language modeling framework based on PyTorch

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

Text vectorization tool to outperform TFIDF for classification tasks

Python generation script for BitBirds

RIDE automatically creates the package and boilerplate OOP Python node scripts as per your needs

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

Artificial Conversational Entity for queries in Eulogio "Amang" Rodriguez Institute of Science and Technology (EARIST)

Journalism AI – Quotes extraction for modular journalism

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

LeBenchmark: a reproducible framework for assessing SSL from speech