Implementation of Multistream Transformers in Pytorch

Last update: Jul 26, 2022

Overview

Multistream Transformers

Implementation of Multistream Transformers in Pytorch.

This repository deviates slightly from the paper, where instead of using the skip connection across all streams, it uses attention pooling across all tokens in the same position. This has produced the best results in my experiments with number of streams greater than 2.

Install

$ pip install multistream-transformers

Usage

import torch
from multistream_transformers import MultistreamTransformer

model = MultistreamTransformer(
    num_tokens = 256,         # number of tokens
    dim = 512,                # dimension
    depth = 4,                # depth
    causal = True,            # autoregressive or not
    max_seq_len = 1024,       # maximum sequence length
    num_streams = 2           # number of streams - 1 would make it a regular transformer
)

x = torch.randint(0, 256, (2, 1024))
mask = torch.ones((2, 1024)).bool()

logits = model(x, mask = mask) # (2, 1024, 256)

Citations

@misc{burtsev2021multistream,
    title   = {Multi-Stream Transformers}, 
    author  = {Mikhail Burtsev and Anna Rumshisky},
    year    = {2021},
    eprint  = {2107.10342},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

You might also like...

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022

PyTorch Implementation of "Light Field Image Super-Resolution with Transformers"

LFT PyTorch implementation of "Light Field Image Super-Resolution with Transformers", arXiv 2021. [pdf]. Contributions: We make the first attempt to a

62 Nov 28, 2022

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

49 Nov 10, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

🌈 ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

This is the official PyTorch implementation for

Implementation of Multistream Transformers in Pytorch

Related tags

Overview

Multistream Transformers

Install

Usage

Citations

You might also like...

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

PyTorch Implementation of "Light Field Image Super-Resolution with Transformers"

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

Explainability for Vision Transformers (in PyTorch)

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Releases(0.0.4)

0.0.4(Jul 31, 2021)

0.0.3(Jul 31, 2021)

0.0.2(Jul 30, 2021)

0.0.1(Jul 30, 2021)

Owner

Phil Wang

The BCNet related data and inference model.

GrailQA: Strongly Generalizable Question Answering

Open Source Differentiable Computer Vision Library for PyTorch

This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

Not Suitable for Work (NSFW) classification using deep neural network Caffe models.

Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)

MEND: Model Editing Networks using Gradient Decomposition

Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Ultra-lightweight human body posture key point CNN model. ModelSize:2.3MB HUAWEI P40 NCNN benchmark: 6ms/img,

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Tensorflow 2 Object Detection API kurulumu, GPU desteği, custom model hazırlama

Study of human inductive biases in CNNs and Transformers.

Official git repo for the CHIRP project

This repository accompanies the ACM TOIS paper "What can I cook with these ingredients?" - Understanding cooking-related information needs in conversational search

Implementation of Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis

Simulation-based performance analysis of server-less Blockchain-enabled Federated Learning

PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized Energy Grids

A unet implementation for Image semantic segmentation

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.