"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

Last update: Nov 16, 2022

Related tags

Text Data & NLP transformers-arithmetic

Overview

transformers-arithmetic

This repository contains the code to reproduce the experiments from the paper:

Nogueira, Jiang, Lin "Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

First, install the required packages:

pip install -r requirements.txt

The command below trains and evaluates a T5-base model on the task of adding up to 15-digits:

python main.py \
    --output_dir=. \
    --model_name_or_path=t5-base \
    --operation=addition \
    --orthography=10ebased \
    --balance_train \
    --balance_val \
    --train_size=100000 \
    --val_size=10000 \
    --test_size=10000 \
    --min_digits_train=2 \
    --max_digits_train=15 \
    --min_digits_test=2 \
    --max_digits_test=15 \
    --base_number=10 \
    --seed=1 \
    --train_batch_size=4 \
    --accumulate_grad_batches=32 \
    --val_batch_size=32 \
    --max_seq_length=512 \
    --num_workers=4 \
    --gpus=1 \
    --optimizer=AdamW \
    --lr=3e-4 \
    --weight_decay=5e-5 \
    --scheduler=StepLR \
    --t_0=2 \
    --t_mult=2 \
    --gamma=1.0 \
    --step_size=1000 \
    --max_epochs=20 \
    --check_val_every_n_epoch=2 \
    --amp_level=O0 \
    --precision=32 \
    --gradient_clip_val=1.0

This training should take 10 hours on a V100 GPU.

The exact match on the test set should be 1:

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_exact_match': 1.0000}
--------------------------------------------------------------------------------

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

Related tags

Overview

transformers-arithmetic

Owner

Castorini

Amazon Multilingual Counterfactual Dataset (AMCD)

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

Tool which allow you to detect and translate text.

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

This github repo is for Neurips 2021 paper, NORESQA A Framework for Speech Quality Assessment using Non-Matching References.

Contract Understanding Atticus Dataset

뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

Scikit-learn style model finetuning for NLP

NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

✨Fast Coreference Resolution in spaCy with Neural Networks

A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

A unified tokenization tool for Images, Chinese and English.

Unofficial Python library for using the Polish Wordnet (plWordNet / Słowosieć)

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

[NeurIPS 2021] Code for Learning Signal-Agnostic Manifolds of Neural Fields

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

Common Voice Dataset explorer

Mlcode - Continuous ML API Integrations

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

Official implementation of Meta-StyleSpeech and StyleSpeech