Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Last update: Nov 19, 2022

Overview

"A Probabilistic Hard Attention Model For Sequentially Observed Scenes"

Authors: Samrudhdhi Rangrej, James Clark Accepted to: BMVC'21 A recurrent attention model sequentially observes glimpses from an image and predicts a class label. At time t, the model actively observes a glimpse g_t and its coordinates l_t. Given g_t and l_t, the feed-forward module F extracts features f_t, and the recurrent module R updates a hidden state to h_t. Using an updated hidden state h_t, the linear classifier C predicts the class distribution p(y|h_t). At time t+1, the model assesses various candidate locations l before attending an optimal one. It predicts p(y|g,l,h_t) ahead of time and selects the candidate l that maximizes KL[p(y|g,l,h_t)||p(y|h_t)]. The model synthesizes the features of g using a Partial VAE to approximate p(y|g,l,h_t) without attending to the glimpse g. The normalizing flow-based encoder S predicts the approximate posterior q(z|h_t). The decoder D uses a sample z~q(z|h_t) to synthesize a feature map f^~ containing features of all glimpses. The model uses f^~(l) as features of a glimpse at location l and evaluates p(y|g,l,h_t)=p(y|f^~(l),h_t). Dashed arrows show a path to compute the lookahead class distribution p(y|f^~(l),h_t).

Requirements:

torch==1.8.1, torchvision==0.9.1, tensorboard==2.5.0, fire==0.4.0

Datasets:

SVHN (Let PyTorch download this dataset)
CIFAR-10 (Let PyTorch download this dataset)
CIFAR-100 (Let PyTorch download this dataset)
CINIC-10 (download from: https://datashare.is.ed.ac.uk/bitstream/handle/10283/3192/CINIC-10.tar.gz, visit https://github.com/BayesWatch/cinic-10)
TinyImageNet (download from: http://cs231n.stanford.edu/tiny-imagenet-200.zip)

Training a model

Use main.py to train and evaluate the model.

Arguments

dataset: one of 'svhn', 'cifar10', 'cifar100', 'cinic10', 'tinyimagenet'
datapath: path to the downloaded datasets
lr: learning rate
training_phase: one of 'first', 'second', 'third'
ccebal: coefficient for cross entropy loss
batch: batch-size for training
batchv: batch-size for evaluation
T: maximum time-step
logfolder: path to log directory
epochs: number of training epochs
pretrain_checkpoint: checkpoint for pretrained model from previous training phase

Example commands to train the model for SVHN dataset are as follows. Training Stage one

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='first' \
    --ccebal=1 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_first' \
    --epochs=1000 \
    --pretrain_checkpoint=None

Training Stage two

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='second' \
    --ccebal=0 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_second' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_first/weights_f_1000.pth'

Training Stage three

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='third' \
    --ccebal=16 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_third' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_second/weights_f_100.pth'

Visualization of attention policy for a CIFAR-10 image

The top row shows the entire image and the EIG maps for t=1 to 6. The bottom row shows glimpses attended by our model. The model observes the first glimpse at a random location. Our model observes a glimpse of size 8x8. The glimpses overlap with the stride of 4, resulting in a 7x7 grid of glimpses. The EIG maps are of size 7x7 and are upsampled for the display. We display the entire image for reference; our model never observes the whole image.

Acknowledgement

Major parts of neural spline flows implementation are borrowed from Karpathy's pytorch-normalizing-flows.

Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Related tags

Overview

"A Probabilistic Hard Attention Model For Sequentially Observed Scenes"

Requirements:

Datasets:

Training a model

Visualization of attention policy for a CIFAR-10 image

Acknowledgement

Owner

HairCLIP: Design Your Hair by Text and Reference Image

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

Train a state-of-the-art yolov3 object detector from scratch!

GT China coal model

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Conformer: Local Features Coupling Global Representations for Visual Recognition

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

🛠️ Tools for Transformers compression using Lightning ⚡

Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes, ICCV 2017

Official repository of Semantic Image Matting

Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

disentanglement_lib is an open-source library for research on learning disentangled representations.

Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

This is the implementation of GGHL (A General Gaussian Heatmap Labeling for Arbitrary-Oriented Object Detection)

Image super-resolution through deep learning

Compartmental epidemic model to assess undocumented infections: applications to SARS-CoV-2 epidemics in Brazil - Datasets and Codes

Use CLIP to represent video for Retrieval Task