A framework to train language models to learn invariant representations.

Last update: Nov 16, 2022

Related tags

Overview

Invariant Language Modeling

Implementation of the training for invariant language models.

Motivation

Modern pretrained language models are critical components of NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, we propose invariant language modeling, a framework to learn invariant representations that should generalize across training environments. In particular, we adapt IRM-games to language models, where the invariance emerges from a specific training schedule in which environments compete to optimize their environment-specific loss by updating subsets of the model in a round-robin fashion.

Model Description

The data is assumed to come as n distinct environments and we aim to learn a language model that focusing on correlations that generalize across environments.

The model is decomposed into two components:

ϕ the main body of the transformer language model,
w the language modeling head that predicts the missing token.

In our implementation, there are now as many heads as environments: n. For each data point, all heads make their predictions and they are averaged. However, during training we sample one batch from each environment in a round-robin fashion. When seeing a batch from environment e only the head w_e and the main body ϕ receive a batch update.

Usage

To get started with the code:

pip install -r requirements.txt

PyTorch with a CUDA installation is required to run this framework. Please find all useful installation information here

Then, to continue the training of a language model from a huggingface checkpoint:

python3 run_invariant_mlm.py \
    --model_name_or_path roberta-base \
    --validation_file data-folder/validation_file.txt \
    --do_train \
    --do_eval \
    --nb_steps 5000 \
    --learning_rate 1e-5 \
    --output_dir folder-to-save-model \
    --seed 123 \
    --train_file data-folder/training-environments \
    --overwrite_cache

Currently, the supported base models are:

roberta: checkpoints
distilbert: checkpoints

Implementation

To train language models according to the IRM-games, one needs to modify:

the training schedule to perform batch updates according to each environment in a round-robin fashion. This logic is implemented by the InvariantTrainer in invariant_trainer.py', a class inherited from the Trainer` from huggingface.
the language modeling heads in the model. It needs one head per environment. This is done by creating variations of the base model classes. It is implemented in invariant_roberta.py for roberta and in invariant_distilbert.py for distilbert.

Contact

Maxime Peyrard, [email protected]

A framework to train language models to learn invariant representations.

Related tags

Overview

Invariant Language Modeling

Motivation

Model Description

Usage

Implementation

Contact

Owner

A Moonraker plug-in for real-time compensation of frame thermal expansion

一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

Codes for the AAAI'22 paper "TransZero: Attribute-guided Transformer for Zero-Shot Learning"

基于Flask开发后端、VUE开发前端框架，在WEB端部署YOLOv5目标检测模型

[ICCV'2021] "SSH: A Self-Supervised Framework for Image Harmonization", Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang

A tool to visualise the results of AlphaFold2 and inspect the quality of structural predictions

Transfer Learning Shootout for PyTorch's model zoo (torchvision)

This repo contains implementation of different architectures for emotion recognition in conversations.

PromptDet: Expand Your Detector Vocabulary with Uncurated Images

Genshin-assets - 👧 Public documentation & static assets for Genshin Impact data.

FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

BBB streaming without Xorg and Pulseaudio and Chromium and other nonsense (heavily WIP)

(JMLR' 19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

PyTorch implementation for ComboGAN

An implementation of the efficient attention module.

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (NeurIPS 2021)

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

Code for "Learning Graph Cellular Automata"

Baseline inference Algorithm for the STOIC2021 challenge.