A python library for highly configurable transformers - easing model architecture search and experimentation.

Overview

configaformers (re-factor in progress)

A python library for highly configurable transformers - easing model architecture search and experimentation. It is premised on building small and independent modules that enables users to configure custom transformer architectures.

Special thanks to lucidrains (https://github.com/lucidrains) and Kharr.

Usage

Quick demo that will configure a 768-wide, 12-layer transformer, with a language modeling head.

Import, and create token embedding block:

import torch
from model_builder import ConfigaFormer

emb = []
model_dim = 768

emb.append({'type': 'embedding',
            'output_dim': model_dim,
            'num_classes': 50257})

Create self-attention module:

attn = []

# Make residual and norm
attn.append({'type': 'make_stream', 'output_name': 'residual'})
attn.append({'type': 'norm', 'norm_type': 'layer_norm'})

# Make QKVs
attn.append({'type': 'linear', 'output_name': 'queries'})
attn.append({'type': 'linear', 'output_name': 'keys'})
attn.append({'type': 'linear', 'output_name': 'values'})

attn.append({'type': 'make_heads', 'input_name': 'queries', 'output_name': 'queries', 'num_heads': 12})
attn.append({'type': 'make_heads', 'input_name': 'keys', 'output_name': 'keys', 'num_heads': 12})

attn.append({'type': 'rope', 'input_name': 'queries', 'output_name': 'queries', 'rotate_dim': 16})
attn.append({'type': 'rope', 'input_name': 'keys', 'output_name': 'keys', 'rotate_dim': 16})

# Perform attention
attn.append({'type': 'mha_dots',
             'input_name_queries': 'queries',
             'input_name_keys': 'keys'})
attn.append({'type': 'attention_offset'})
attn.append({'type': 'mha_sum',
             'input_name_values': 'values'})

# Mix
attn.append({'type': 'linear'})

# Add residual
attn.append({'type': 'merge_streams',
             'input_name_1': 'residual',
             'merge_type': 'add'})

Create FFN module:

ffn = []

# Make residual and norm
ffn.append({'type': 'make_stream', 'output_name': 'residual'})
ffn.append({'type': 'norm', 'norm_type': 'layer_norm'})

# Proj Up
ffn.append({'type': 'linear', 'output_dim': 768*4})

# Activation
ffn.append({'type': 'activation'})

# Proj Down
ffn.append({'type': 'linear', 'output_dim': 768})

# Add residual
ffn.append({'type': 'merge_streams',
             'input_name_1': 'residual',
             'merge_type': 'add'})

Create language modeling head:

to_logits = []
to_logits.append({'type': 'linear', 'output_dim': 50257})

Create blocks, initialize input shapes, and init the model:

transformer_block = attn + ffn
classifier = ffn + to_logits

blocks = [{"config": emb,
           "repeat": 1},
          {"config": transformer_block,
           "repeat": 12},
          {"config": classifier,
           "repeat": 1},
          ]
          
my_config = {'blocks' = blocks}
input_streams = {'emb_ids': ['B', 'L_in'],
                 'attn_offset': ['B', 12, 'L_in', 'L_in'],}

model = ConfigaFormer(model_config=my_config,
                     input_streams=input_streams).cuda()

This will print out the transformer config:

Block #1, 1x
embedding -> Input(s): emb_ids (BSZ, L_in) - Output(s): x (BSZ, L_in, 768)


Block #2, 12x
make_stream -> Input(s): x (BSZ, L_in, 768) - Output(s): residual (BSZ, L_in, 768)
norm -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): queries (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): keys (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): values (BSZ, L_in, 768)
make_heads -> Input(s): queries (BSZ, L_in, 768) - Output(s): queries (BSZ, 12, L_in, 64)
make_heads -> Input(s): keys (BSZ, L_in, 768) - Output(s): keys (BSZ, 12, L_in, 64)
rope -> Input(s): queries (BSZ, 12, L_in, 64), rope_16 (2048, 16) - Output(s): queries (BSZ, 12, L_in, 64)
rope -> Input(s): keys (BSZ, 12, L_in, 64), rope_16 (2048, 16) - Output(s): keys (BSZ, 12, L_in, 64)
mha_dots -> Input(s): queries (BSZ, 12, L_in, 64), keys (BSZ, 12, L_in, 64) - Output(s): attn_dots (BSZ, 12, L_in, L_in)
attention_offset -> Input(s): attn_dots (BSZ, 12, L_in, L_in), attn_offset (BSZ, 12, L_in, L_in) - Output(s): attn_dots (BSZ, 12, L_in, L_in)
mha_sum -> Input(s): values (BSZ, L_in, 768), attn_dots (BSZ, 12, L_in, L_in) - Output(s): x (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
merge_streams -> Input(s): residual (BSZ, L_in, 768), x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
make_stream -> Input(s): x (BSZ, L_in, 768) - Output(s): residual (BSZ, L_in, 768)
norm -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 3072)
activation -> Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 3072)
linear -> Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 768)
merge_streams -> Input(s): residual (BSZ, L_in, 768), x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)


Block #3, 1x
make_stream -> Input(s): x (BSZ, L_in, 768) - Output(s): residual (BSZ, L_in, 768)
norm -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 3072)
activation -> Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 3072)
linear -> Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 768)
merge_streams -> Input(s): residual (BSZ, L_in, 768), x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 50257)

Before running, we need to get the attention offset (in this case, AliBi with a causal mask):

from attention_offset_module import get_alibi

attn_offset = get_alibi(num_heads=12)

Now we can use the model:

input_data = {'emb_ids': batch_ids.view(bsz, 1024).cuda(),
              'attn_offset': attn_offset.cuda()}

logits = model(input_data)['x'].view(bsz, 1024, 50257)

TODO

  1. Token shifting, down/up sampling
  2. Create higher abstractions for FFN and self-attention
  3. everything else
Owner
Anthony Fuller
Anthony Fuller
This code is a near-infrared spectrum modeling method based on PCA and pls

Nirs-Pls-Corn This code is a near-infrared spectrum modeling method based on PCA and pls 近红外光谱分析技术属于交叉领域,需要化学、计算机科学、生物科学等多领域的合作。为此,在(北邮邮电大学杨辉华老师团队)指导下

Fu Pengyou 6 Dec 17, 2022
This is a vision-based 3d model manipulation and control UI

Manipulation of 3D Models Using Hand Gesture This program allows user to manipulation 3D models (.obj format) with their hands. The project support bo

Cortic Technology Corp. 43 Oct 23, 2022
EqGAN - Improving GAN Equilibrium by Raising Spatial Awareness

EqGAN - Improving GAN Equilibrium by Raising Spatial Awareness Improving GAN Equilibrium by Raising Spatial Awareness Jianyuan Wang, Ceyuan Yang, Ying

GenForce: May Generative Force Be with You 149 Dec 19, 2022
Testability-Aware Low Power Controller Design with Evolutionary Learning, ITC2021

Testability-Aware Low Power Controller Design with Evolutionary Learning This repo contains the source code of Testability-Aware Low Power Controller

Lee Man 1 Dec 26, 2021
Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation Prerequisites This repo is built upon a local copy of transfo

Jixuan Wang 10 Sep 28, 2022
[NeurIPS 2021] "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators"

G-PATE This is the official code base for our NeurIPS 2021 paper: "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of T

AI Secure 14 Oct 12, 2022
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 03, 2023
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022
Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

Peng Qiao 1 Dec 14, 2021
ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS.

ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS. It currently supports four examples for you to quickly experience the power of ONNX Runti

Microsoft 58 Dec 18, 2022
Development of IP code based on VIPs and AADM

Sparse Implicit Processes In this repository we include the two different versions of the SIP code developed for the article Sparse Implicit Processes

1 Aug 22, 2022
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

xmu-xiaoma66 7.7k Jan 05, 2023
A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196

img_sussifier A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196 Examples How to use install python pip i

41 Sep 30, 2022
Contrastive Language-Image Pretraining

CLIP [Blog] [Paper] [Model Card] [Colab] CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pair

OpenAI 11.5k Jan 08, 2023
Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Joint Learning of 3D Shape Retrieval and Deformation Joint Learning of 3D Shape Retrieval and Deformation Mikaela Angelina Uy, Vladimir G. Kim, Minhyu

Mikaela Uy 38 Oct 18, 2022
This repository is for Competition for ML_data class

This repository is for Competition for ML_data class. Based on mmsegmentatoin,mainly using swin transformer to completed the competition.

jianlong 2 Oct 23, 2022
Fully convolutional networks for semantic segmentation

FCN-semantic-segmentation Simple end-to-end semantic segmentation using fully convolutional networks [1]. Takes a pretrained 34-layer ResNet [2], remo

Kai Arulkumaran 186 Dec 25, 2022
ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Ibai Gorordo 18 Nov 06, 2022
Compare GAN code.

Compare GAN This repository offers TensorFlow implementations for many components related to Generative Adversarial Networks: losses (such non-saturat

Google 1.8k Jan 05, 2023
Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

🤗 Transformers Wav2Vec2 + PyCTCDecode Introduction This repo shows how 🤗 Transformers can be used in combination with kensho-technologies's PyCTCDec

Patrick von Platen 102 Oct 22, 2022