PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

Last update: Jul 27, 2022

Overview

ALiBi

PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

Quickstart

Clone this repository.

git clone https://github.com/jaketae/alibi.git

Navigate to the cloned directory. You can use the bare-bone ALiBi decoder via

>>> import torch; from alibi import ALiBiConfig, ALiBiTransformer
>>> config  = ALiBiConfig()
>>> model = ALiBiTransformer(config)
>>> x = torch.randn(8, 100, 256)
>>> model(x).shape
torch.Size([8, 100, 256])

By default, the model comes with the following parameters:

ALiBiConfig(
    num_layers=6, 
    d_model=256, 
    num_heads=8, 
    max_len=256, 
    dropout=0.1, 
    causal=True, 
    expansion_factor=1
)

To use an encoder instead of a decoder, simply toggle causal=False.

Abstract

Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question remains open: how to achieve extrapolation at inference time to longer sequences than seen during training? We first show that extrapolation can be improved by changing the position representation method, though we find that existing proposals do not allow efficient extrapolation. We introduce a simple and efficient method, Attention with Linear Biases (ALiBi), that allows for extrapolation. ALiBi does not add positional embeddings to the word embeddings; instead, it biases the query-key attention scores with a term that is proportional to their distance. We show that this method allows training a 1.3 billion parameter model on input sequences of length 1024 that extrapolates to input sequences of length 2048, achieving the same perplexity as a sinusoidal position embedding model trained on inputs of length 2048, 11% faster and using 11% less memory. ALiBi's inductive bias towards recency allows it to outperform multiple strong position methods on the WikiText-103 benchmark. Finally, we provide analysis of ALiBi to understand why it leads to better performance.

Citation

@misc{press2021train,
	title        = {Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation},
	author       = {Ofir Press and Noah A. Smith and Mike Lewis},
	year         = 2021,
	eprint       = {2108.12409},
	archiveprefix = {arXiv},
	primaryclass = {cs.CL}
}

PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

Related tags

Overview

ALiBi

Quickstart

Abstract

Citation

Owner

Jake Tae

Spatial color quantization in Rust

The repository offers the official implementation of our paper in PyTorch.

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

YOLOV4运行在嵌入式设备上

A module that used for encrypt code which includes RSA and AES

PyTorch implementation for MINE: Continuous-Depth MPI with Neural Radiance Fields

Learning What and Where to Draw

Warning: This project does not have any current developer. See bellow.

InvTorch: memory-efficient models with invertible functions

AntiFuzz: Impeding Fuzzing Audits of Binary Executables

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Constraint-based geometry sketcher for blender

vit for few-shot classification

Tool for installing and updating MiSTer cores and other files

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .

Polynomial-time Meta-Interpretive Learning

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

Related tags

Overview

ALiBi

Quickstart

Abstract

Citation

Owner

Jake Tae

Spatial color quantization in Rust

The repository offers the official implementation of our paper in PyTorch.

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

YOLOV4运行在嵌入式设备上

A module that used for encrypt code which includes RSA and AES

PyTorch implementation for MINE: Continuous-Depth MPI with Neural Radiance Fields

Learning What and Where to Draw

Warning: This project does not have any current developer. See bellow.

InvTorch: memory-efficient models with invertible functions

AntiFuzz: Impeding Fuzzing Audits of Binary Executables

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Constraint-based geometry sketcher for blender

vit for few-shot classification

Tool for installing and updating MiSTer cores and other files

The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

Polynomial-time Meta-Interpretive Learning

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .