A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Last update: Sep 20, 2022

Related tags

Overview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Source: Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

A PyTorch implementation of TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? [1-2]. Unlike another Unofficial PyTorch implementation [3], our version is heavily borrowed from the official implementation [4] and TensorFlow implementation[5], and try to keep consistent with them.

Usage

You can access the TokenLearner and TokenLearnerModuleV11 class from the tokenlearner file. You can use this layer with a Vision Transformer, MLPMixer, or Video Vision Transformer as done in the paper.

import torch
from tokenlearner import TokenLearner

tklr = TokenLearner(in_channels=128, num_tokens=8, use_sum_pooling=False)

x = torch.ones(256, 32, 32, 128)  # [bs, h, w, c]
y1 = tklr(x)
print(y1.shape)  # [256, 8, 128]

You can also use TokenLearnerModuleV11, which aligns with the official implementation.

import torch
from tokenlearner import TokenLearnerModuleV11

tklr_v11 = TokenLearnerModuleV11(in_channels=128, num_tokens=8, num_groups=4, dropout_rate=0.)

tklr_v11.eval()  # control droput
x = torch.ones(256, 32, 32, 128)   # [bs, h, w, c]
y2 = tklr_v11(x)
print(y2.shape)  # [256, 8, 128]

References

[1] TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?; Ryoo et al.; arXiv 2021; https://arxiv.org/abs/2106.11297

[2] TokenLearner: Adaptive Space-Time Tokenization for Videos; Ryoo et al., NeurIPS 2021; https://openreview.net/forum?id=z-l1kpDXs88

[3] Unofficial PyTorch implementation

[4] official implementation

[5] TensorFlow implementation

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Related tags

Overview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Usage

References

Owner

Caiyong Wang

A benchmark dataset for emulating atmospheric radiative transfer in weather and climate models with machine learning (NeurIPS 2021 Datasets and Benchmarks Track)

GAN-STEM-Conv2MultiSlice - Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

A deep learning model for style-specific music generation.

PushForKiCad - AISLER Push for KiCad EDA

An efficient framework for reinforcement learning.

Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

Face and Body Tracking for VRM 3D models on the web.

PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Python package for Bayesian Machine Learning with scikit-learn API

A robust pointcloud registration pipeline based on correlation.

Adaptive Attention Span for Reinforcement Learning

Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

Meta Representation Transformation for Low-resource Cross-lingual Learning

내가 보려고 정리한 <프로그래밍 기초 Ⅰ> / organized for me

ICS 4u HD project, start before-wards. A curtain shooting game using python.

The pytorch implementation of SOKD (BMVC2021).

Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier

Explainability for Vision Transformers (in PyTorch)