PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Last update: Feb 27, 2022

Overview

Transformer-PyTorch

A PyTorch implementation of the Transformer from the paper Attention is All You Need in both Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Pre-LN applies LayerNorm to the input of every sublayers instead of the residual connection part in Post-LN. The proposed model architecture in the paper was in Post-LN, however the official implementation has been changed into Pre-LN version. The experiment result shows that Pre-LN transformer converges faster while doesn't even need warming up, and is less sensitive to hyperparameters. For more detail about the difference between them, check out the paper On Layer Normalization in the Transformer Architecture.

A STAR would be so nice if you like it!

Dataset

The English-German small-dataset WMT 2016 multimodal task from torchtext.

Prerequisites

Python3
PyTorch >= 1.2.0
torchtext
spacy
nltk
tqdm

Implementation Notes

Beam search is not supported.
Label smoothing is not implemented.
BPE is not adapted.

Usage

Run transformer.ipynb to download dataset and train the model.
Change the flag pre_lnorm to determine which to use.

Evaluation

Parameter settings
- hidden size: 512
- feed forward size: 2048
- num head: 8
- layer: 6
- warm-up: 2000
- batch size: 128

Generated Examples

Here's an example from test data:

source
- eine frau verwendet eine bohrmaschine während ein mann sie fotografiert .
gold
- a woman uses a drill while another man takes her picture .
inference
- a woman uses an electric drill as a man takes a picture .

TODO

Label smoothing
Attention visualization

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Related tags

Overview

Transformer-PyTorch

A STAR would be so nice if you like it!

Dataset

Prerequisites

Implementation Notes

Usage

Evaluation

Generated Examples

TODO

References

Owner

Jared Wang

RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

Tesla Light Show xLights Guide With python

A curated list of awesome Deep Learning tutorials, projects and communities.

This is a code repository for paper OODformer: Out-Of-Distribution Detection Transformer

Clean Machine Learning, a Coding Kata

This is a simple face recognition mini project that was completed by a team of 3 members in 1 week's time

A library for Deep Learning Implementations and utils

🔮 Execution time predictions for deep neural network training iterations across different GPUs.

Explore extreme compression for pre-trained language models

DeepFaceLab fork which provides IPython Notebook to use DFL with Google Colab

Our VMAgent is a platform for exploiting Reinforcement Learning (RL) on Virtual Machine (VM) scheduling tasks.

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

Sionna: An Open-Source Library for Next-Generation Physical Layer Research

A pytorch implementation of Pytorch-Sketch-RNN

Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

A curated list of awesome game datasets, and tools to artificial intelligence in games

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

FastFace: Lightweight Face Detection Framework

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space