TANL: Structured Prediction as Translation between Augmented Natural Languages

Related tags

Deep Learningtanl
Overview

TANL: Structured Prediction as Translation between Augmented Natural Languages

Code for the paper "Structured Prediction as Translation between Augmented Natural Languages" (ICLR 2021).

If you use this code, please cite the paper using the bibtex reference below.

@inproceedings{tanl,
    title={Structured Prediction as Translation between Augmented Natural Languages},
    author={Giovanni Paolini and Ben Athiwaratkun and Jason Krone and Jie Ma and Alessandro Achille and Rishita Anubhai and Cicero Nogueira dos Santos and Bing Xiang and Stefano Soatto},
    booktitle={9th International Conference on Learning Representations, {ICLR} 2021},
    year={2021},
}

Requirements

  • Python 3.6+
  • PyTorch (tested with version 1.7.1)
  • Transformers (tested with version 4.0.0)
  • NetworkX (tested with version 2.5, only used in coreference resolution)

You can install all required Python packages with pip install -r requirements.txt

Datasets

By default, datasets are expected to be in data/DATASET_NAME. Dataset-specific code is in datasets.py.

For example, the CoNLL04 and ADE datasets (joint entity and relation extraction) in the correct format can be downloaded using https://github.com/markus-eberts/spert/blob/master/scripts/fetch_datasets.sh. For other datasets, pre-processing and links are documented in the code.

Running the code

Use the following command: python run.py JOB

The JOB argument refers to a section of the config file, which by default is config.ini. A sample config file is provided, with settings that allow for a faster training and less memory usage than the settings used to obtain the final results in the paper.

For example, to replicate the paper's results on CoNLL04, have the following section in the config file:

[conll04_final]
datasets = conll04
model_name_or_path = t5-base
num_train_epochs = 200
max_seq_length = 256
max_seq_length_eval = 512
train_split = train,dev
per_device_train_batch_size = 8
per_device_eval_batch_size = 16
do_train = True
do_eval = False
do_predict = True
episodes = 1-10
num_beams = 8

Then run python run.py conll04_final. Note that the final results will differ slightly from the ones reported in the paper, due to small code changes and randomness.

Config arguments can be overwritten by command line arguments. For example: python run.py conll04_final --num_train_epochs 50.

Additional details

If do_train = True, the model is trained on the given train split (e.g., 'train') of the given datasets. The final weights and intermediate checkpoints are written in a directory such as experiments/conll04_final-t5-base-ep200-len256-b8-train, with one subdirectory per episode. Results in JSON format are also going to be saved there.

In every episode, the model is trained on a different (random) permutation of the training set. The random seed is given by the episode number, so that every episode always produces the same exact model.

Once a model is trained, it is possible to evaluate it without training again. For this, set do_train = False or (more easily) provide the -e command-line argument: python run.py conll04_final -e.

If do_eval = True, the model is evaluated on the 'dev' split. If do_predict = True, the model is evaluated on the 'test' split.

Arguments

The following are the most important command-line arguments for the run.py script. Run python run.py -h for the full list.

  • -c CONFIG_FILE: specify config file to use (default is config.ini)
  • -e: only run evaluation (overwrites the setting do_train in the config file)
  • -a: evaluate also intermediate checkpoints, in addition to the final model
  • -v : print results for each evaluation run
  • -g GPU: specify which GPU to use for evaluation

The following are the most important arguments for the config file. See the sample config file to understand the format.

  • datasets (str): comma-separated list of datasets for training
  • eval_datasets (str): comma-separated list of datasets for evaluation (default is the same as for training)
  • model_name_or_path (str): path to pretrained model or model identifier from huggingface.co/models (e.g. t5-base)
  • do_train (bool): whether to run training (default is False)
  • do_eval (bool): whether to run evaluation on the dev set (default is False)
  • do_predict (bool): whether to run evaluation on the test set (default is False)
  • train_split (str): comma-separated list of data splits for training (default is train)
  • num_train_epochs (int): number of train epochs
  • learning_rate (float): initial learning rate (default is 5e-4)
  • train_subset (float > 0 and <=1): portion of training data to effectively use during training (default is 1, i.e., use all training data)
  • per_device_train_batch_size (int): batch size per GPU during training (default is 8)
  • per_device_eval_batch_size (int): batch size during evaluation (default is 8; only one GPU is used for evaluation)
  • max_seq_length (int): maximum input sequence length after tokenization; longer sequences are truncated
  • max_output_seq_length (int): maximum output sequence length (default is max_seq_length)
  • max_seq_length_eval (int): maximum input sequence length for evaluation (default is max_seq_length)
  • max_output_seq_length_eval (int): maximum output sequence length for evaluation (default is max_output_seq_length or max_seq_length_eval or max_seq_length)
  • episodes (str): episodes to run (default is 0; an interval can be specified, such as 1-4; the episode number is used as the random seed)
  • num_beams (int): number of beams for beam search during generation (default is 1)
  • multitask (bool): if True, the name of the dataset is prepended to each input sentence (default is False)

See arguments.py and transformers.TrainingArguments for additional config arguments.

3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022
Detecting Potentially Harmful and Protective Suicide-related Content on Twitter

TwitterSuicideML Scripts for reproducing the Machine Learning analysis of the paper: Detecting Potentially Harmful and Protective Suicide-related Cont

3 Oct 17, 2022
Learning Visual Words for Weakly-Supervised Semantic Segmentation

[IJCAI 2021] Learning Visual Words for Weakly-Supervised Semantic Segmentation Implementation of IJCAI 2021 paper Learning Visual Words for Weakly-Sup

Lixiang Ru 24 Oct 05, 2022
An open source Python package for plasma science that is under development

PlasmaPy PlasmaPy is an open source, community-developed Python 3.7+ package for plasma science. PlasmaPy intends to be for plasma science what Astrop

PlasmaPy 444 Jan 07, 2023
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

419 Jan 03, 2023
Pytorch implementation of XRD spectral identification from COD database

XRDidentifier Pytorch implementation of XRD spectral identification from COD database. Details will be explained in the paper to be submitted to NeurI

Masaki Adachi 4 Jan 07, 2023
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

171 Dec 29, 2022
[ACM MM2021] MGH: Metadata Guided Hypergraph Modeling for Unsupervised Person Re-identification

Introduction This project is developed based on FastReID, which is an ongoing ReID project. Projects BUC In projects/BUC, we implement AAAI 2019 paper

WuYiming 7 Apr 13, 2022
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

GLAT Implementation for the ACL2021 paper "Glancing Transformer for Non-Autoregressive Neural Machine Translation" Requirements Python = 3.7 Pytorch

117 Jan 09, 2023
Interactive web apps created using geemap and streamlit

geemap-apps Introduction This repo demostrates how to build a multi-page Earth Engine App using streamlit and geemap. You can deploy the app on variou

Qiusheng Wu 27 Dec 23, 2022
Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions

Natural Posterior Network This repository provides the official implementation o

Oliver Borchert 54 Dec 06, 2022
Contrastive Language-Image Pretraining

CLIP [Blog] [Paper] [Model Card] [Colab] CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pair

OpenAI 11.5k Jan 08, 2023
A Closer Look at Reference Learning for Fourier Phase Retrieval

A Closer Look at Reference Learning for Fourier Phase Retrieval This repository contains code for our NeurIPS 2021 Workshop on Deep Learning and Inver

Tobias Uelwer 1 Oct 28, 2021
[NeurIPS 2021] Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training

Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training Code for NeurIPS 2021 paper "Better Safe Than Sorry: Preventing Delu

Lue Tao 29 Sep 20, 2022
Neurolab is a simple and powerful Neural Network Library for Python

Neurolab Neurolab is a simple and powerful Neural Network Library for Python. Contains based neural networks, train algorithms and flexible framework

152 Dec 06, 2022
Notification Triggers for Python

Notipyer Notification triggers for Python Send async email notifications via Python. Get updates/crashlogs from your scripts with ease. Installation p

Chirag Jain 17 May 16, 2022
Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph

Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph Model Description Open-CyKG is a framework that is constructed using an attenti

Injy Sarhan 34 Jan 05, 2023
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2D-TAN (Optimized) Introduction This is an optimized re-implementation repository for AAAI'2020 paper: Learning 2D Temporal Localization Networks for

Joya Chen 112 Dec 31, 2022
A curated list of awesome deep long-tailed learning resources.

A curated list of awesome deep long-tailed learning resources.

vanint 210 Dec 25, 2022