Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
Please refer to tf/README.md for details.

PyTorch

The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method	enwiki8	text8	One Billion Word	WT-103	PTB (w/o finetuning)
Previous Best	1.06	1.13	23.7	20.5	55.5
Transformer-XL	0.99	1.08	21.8	18.3	54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Related tags

Overview

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

Owner

Zhilin Yang

NLP tool to extract emotional phrase from tweets 🤩

Harvis is designed to automate your C2 Infrastructure.

Minimal GUI for accessing the Watson Text to Speech service.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

An implementation of the Pay Attention when Required transformer

Use Tensorflow2.7.0 Build OpenAI'GPT-2

Package for controllable summarization

This is a project of data parallel that running on NLP tasks.

Build Text Rerankers with Deep Language Models

Auto-researching tool generating word documents.

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

A simple version of DeTR

Must-read papers on improving efficiency for pre-trained language models.

A Japanese tokenizer based on recurrent neural networks

Athena is an open-source implementation of end-to-end speech processing engine.

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

This is Assignment1 code for the Web Data Processing System.

ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体