Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Last update: Dec 29, 2022

Overview

Hierarchical Transformer Memory (HTM) - Pytorch

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch. This Deepmind paper proposes a simple method to allow transformers to attend to memories of the past efficiently. Original Jax repository

Install

$ pip install htm-pytorch

Usage

import torch
from htm_pytorch import HTMAttention

attn = HTMAttention(
    dim = 512,
    heads = 8,               # number of heads for within-memory attention
    dim_head = 64,           # dimension per head for within-memory attention
    topk_mems = 8,           # how many memory chunks to select for
    mem_chunk_size = 32,     # number of tokens in each memory chunk
    add_pos_enc = True       # whether to add positional encoding to the memories
)

queries = torch.randn(1, 128, 512)     # queries
memories = torch.randn(1, 20000, 512)  # memories, of any size
mask = torch.ones(1, 20000).bool()     # memory mask

attended = attn(queries, memories, mask = mask) # (1, 128, 512)

If you want the entire HTM Block (which contains the layernorm for the input followed by a skip connection), just import HTMBlock instead

import torch
from htm_pytorch import HTMBlock

block = HTMBlock(
    dim = 512,
    topk_mems = 8,
    mem_chunk_size = 32
)

queries = torch.randn(1, 128, 512)
memories = torch.randn(1, 20000, 512)
mask = torch.ones(1, 20000).bool()

out = block(queries, memories, mask = mask) # (1, 128, 512)

Citations

@misc{lampinen2021mental,
    title   = {Towards mental time travel: a hierarchical memory for reinforcement learning agents}, 
    author  = {Andrew Kyle Lampinen and Stephanie C. Y. Chan and Andrea Banino and Felix Hill},
    year    = {2021},
    eprint  = {2105.14039},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

78 Jan 7, 2023

Episodic-memory - Ego4D Episodic Memory Benchmark

Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p

3 Feb 18, 2022

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

H-Transformer-1D Implementation of H-Transformer-1D, Transformer using hierarchical Attention for sequence learning with subquadratic costs. For now,

123 Nov 17, 2022

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

hierarchical-transformer-1d Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers In Progress!! 2021.

7 Nov 6, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

HiFT: Hierarchical Feature Transformer for Aerial Tracking Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li Our paper is Accepted by ICCV 2

Intelligent Vision for Robotics in Complex Environment

55 Nov 23, 2022

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

R2D2 This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Mode

49 Dec 17, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Comments

auto-regressive use case
Hi Phil! I was wondering if HTM part can be used in/for auto-regressive scenario? Full proposed arch in the paper has 3 blocks:

Self Att - this can be easily done with causal masking

next we have HTM block with memories - can it be used in autoregressive scenario i wonder?

Feed Forward block

please let me know your thoughts?
opened by inspirit 0

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Related tags

Overview

Hierarchical Transformer Memory (HTM) - Pytorch

Install

Usage

Citations

You might also like...

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Episodic-memory - Ego4D Episodic Memory Benchmark

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Comments

auto-regressive use case

Releases(0.0.4)

0.0.4(Sep 15, 2021)

0.0.3(Sep 14, 2021)

0.0.2(Sep 14, 2021)

0.0.1(Sep 14, 2021)

Owner

Phil Wang

From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement (CVPR'2020)

Fine-grained Post-training for Improving Retrieval-based Dialogue Systems - NAACL 2021

Multimodal Temporal Context Network (MTCN)

WSDM‘2022: Knowledge Enhanced Sports Game Summarization

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

A simple and lightweight genetic algorithm for optimization of any machine learning model

You Only Look One-level Feature (YOLOF), CVPR2021, Detectron2

ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

Uncertain natural language inference

Unified MultiWOZ evaluation scripts for the context-to-response task.

Awesome Graph Classification - A collection of important graph embedding, classification and representation learning papers with implementations.

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

An example to implement a new backbone with OpenMMLab framework.

Ground truth data for the Optical Character Recognition of Historical Classical Commentaries.

Code of the paper "Part Detector Discovery in Deep Convolutional Neural Networks" by Marcel Simon, Erik Rodner and Joachim Denzler

Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

A tensorflow implementation of an HMM layer

Fast, flexible and fun neural networks.