NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Last update: Dec 08, 2022

Related tags

Overview

NLP From Scratch Without Large-Scale Pretraining

This repository contains the code, pre-trained model checkpoints and curated datasets for our paper: NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework.

In our proposed framework, named TLM (task-driven language modeling), instead of training a language model over the entire general corpus and then finetuning it on task data, we first usetask data as queries to retrieve a tiny subset of the general corpus, and then perform joint learning on both the task objective and self-supervised language modeling objective.

Requirements

We implement our models and training loops based on the opensource products from HuggingFace. The core denpencies of this repository are listed in requirements.txt, which can be installed through:

pip install -r requirements.txt

All our experiments are conducted on a node with 8 A100 40GB SXM gpus. Different computational devices may result slightly different results from the reported ones.

Models and Datasets

We release the trained models on 8 tasks with 3 different scales, together with the task datasets and selected external data. Our released model checkpoints, datasets and the performance of each model for each task are listed in the following table.

	AGNews	Hyp.	Help.	IMDB	ACL.	SciERC	Chem.	RCT
Small	93.74	93.53	70.54	93.08	69.84	80.51	81.99	86.99
Medium	93.96	94.05	70.90	93.97	72.37	81.88	83.24	87.28
Large	94.36	95.16	72.49	95.77	72.19	83.29	85.12	87.50

The released models and datasets are compatible with HuggingFace's Transformers and Datasets. We provide an example script to evaluate a model checkpoints on a certain task, run

bash example_scripts/evaluate.sh

To get the evaluation results for SciERC with a small-scale model.

Training

We provide two example scripts to train a model from scratch, run

bash example_scripts/train.sh && bash example_scripts/finetune.sh

To train a small-scale model for SciERC. Here example_scripts/train.sh corresponds to the first stage training where the external data ratio and MLM weight are non-zero, and example_scripts/finetune.sh corresponds to the second training stage where no external data or self-supervised loss can be perceived by the model.

Citation

Please cite our paper if you use TLM in your work:

@misc{yao2021tlm,
title={NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework},
author={Yao, Xingcheng and Zheng, Yanan and Yang, Xiaocong and Yang, Zhilin},
year={2021}
}

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Related tags

Overview

NLP From Scratch Without Large-Scale Pretraining

Requirements

Models and Datasets

Training

Citation

Owner

Xingcheng Yao

Le dataset des images du projet d'IA de 2021

Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

From a body shape, infer the anatomic skeleton.

CPU inference engine that delivers unprecedented performance for sparse models

PyTorch implementation of DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images

A toolset for creating Qualtrics-based IAT experiments

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant.

Official Datasets and Implementation from our Paper "Video Class Agnostic Segmentation in Autonomous Driving".

PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning.

PyTorch implementation of "Learn to Dance with AIST++: Music Conditioned 3D Dance Generation."

Autolfads-tf2 - A TensorFlow 2.0 implementation of Latent Factor Analysis via Dynamical Systems (LFADS) and AutoLFADS

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.

This is the reference implementation for "Coresets via Bilevel Optimization for Continual Learning and Streaming"

An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

HAT: Hierarchical Aggregation Transformers for Person Re-identification