Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Last update: Jan 06, 2023

Related tags

Deep Learning transformer-xl

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
Please refer to tf/README.md for details.

PyTorch

The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method	enwiki8	text8	One Billion Word	WT-103	PTB (w/o finetuning)
Previous Best	1.06	1.13	23.7	20.5	55.5
Transformer-XL	0.99	1.08	21.8	18.3	54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Related tags

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

Owner

Zhilin Yang

Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement (ICCV 2021, Oral)

Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

Offcial implementation of "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction, ICCV-2021".

A repo to show how to use custom dataset to train s2anet, and change backbone to resnext101

Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU

Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

An image classification app boilerplate to serve your deep learning models asap!

Direct design of biquad filter cascades with deep learning by sampling random polynomials.

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

TensorFlow implementation of "Variational Inference with Normalizing Flows"

bespoke tooling for offensive security's Windows Usermode Exploit Dev course (OSED)

code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

Create animations for the optimization trajectory of neural nets

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

Image Super-Resolution by Neural Texture Transfer

Official implementation of NeurIPS'21: Implicit SVD for Graph Representation Learning

Official Pytorch implementation of "DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network" (CVPR'21)

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.