Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Last update: Nov 21, 2022

Overview

Diverse Image Captioning with Context-Object Split Latent Spaces

This repository is the PyTorch implementation of the paper:

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

We additionally include evaluation code from Luo et al. in the folder GoogleConceptualCaptioning , which has been patched for compatibility.

Requirements

The following code is written in Python 3.6.10 and CUDA 9.0.

Requirements:

torch 1.1.0
torchvision 0.3.0
nltk 3.5
inflect 4.1.0
tqdm 4.46.0
sklearn 0.0
h5py 2.10.0

To install requirements:

conda config --add channels pytorch
conda config --add channels anaconda
conda config --add channels conda-forge
conda config --add channels conda-forge/label/cf202003
conda create -n <environment_name> --file requirements.txt
conda activate <environment_name>

Preprocessed data

The dataset used in this project for assessing accuracy and diversity is COCO 2014 (m-RNN split). The full dataset is available here.

We use the Faster R-CNN features for images similar to Anderson et al.. We additionally require "classes"/"scores" fields detected for image regions. The classes correspond to Visual Genome.

Download instructions

Preprocessed training data is available here as hdf5 files. The provided hdf5 files contain the following fields:

image_id: ID of the COCO image
num_boxes: The proposal regions detected from Faster R-CNN
features: ResNet-101 features of the extracted regions
classes: Visual genome classes of the extracted regions
scores: Scores of the Visual genome classes of the extracted regions

Note that the ["image_id","num_boxes","features"] fields are identical to Anderson et al.

Create a folder named coco and download the preprocessed training and test datasets from the coco folder in the drive link above as follows (it is also possible to directly download the entire coco folder from the drive link):

Download the following files for training on COCO 2014 (m-RNN split):

coco/coco_train_2014_adaptive_withclasses.h5
coco/coco_val_2014_adaptive_withclasses.h5
coco/coco_val_mRNN.txt
coco/coco_test_mRNN.txt

Download the following files for training on held-out COCO (novel object captioning):

coco/coco_train_2014_noc_adaptive_withclasses.h5
coco/coco_train_extra_2014_noc_adaptive_withclasses.h5

Download the following files for testing on held-out COCO (novel object captioning):

coco/coco_test_2014_noc_adaptive_withclasses.h5

Download the (caption) annotation files and place them in a subdirectory coco/annotations (mirroring the Google drive folder structure)

coco/annotations/captions_train2014.json
coco/annotations/captions_val2014.json

Download the following files from the drive link in a seperate folder data (outside coco). These files contain the contextual neighbours for pseudo supervision:

data/nn_final.pkl
data/nn_noc.pkl

For running the train/test scripts (described in the following) "pathToData"/"nn_dict_path" in params.json and params_noc.json needs to be set to the coco/data folder created above.

Verify Folder Structure after Download

The folder structure of coco after data download should be as follows,

coco
 - annotations
   - captions_train2014.json
   - captions_val2014.json
 - coco_val_mRNN.txt
 - coco_test_mRNN.txt
 - coco_train_2014_adaptive_withclasses.h5
 - coco_val_2014_adaptive_withclasses.h5
 - coco_train_2014_noc_adaptive_withclasses.h5
 - coco_train_extra_2014_noc_adaptive_withclasses.h5
 - coco_test_2014_noc_adaptive_withclasses.h5
data
 - coco_classname.txt
 - visual_genome_classes.txt
 - vocab_coco_full.pkl
 - nn_final.pkl
 - nn_noc.pkl

Training

Please follow the following instructions for training:

Set hyperparameters for training in params.json and params_noc.json.
Train a model on COCO 2014 for captioning,

   	python ./scripts/train.py

Train a model for diverse novel object captioning,

   	python ./scripts/train_noc.py

Please note that the data folder provides the required vocabulary.

Memory requirements

The models were trained on a single nvidia V100 GPU with 32 GB memory. 16 GB is sufficient for training a single run.

Pre-trained models and evaluation

We provide pre-trained models for both captioning on COCO 2014 (mRNN split) and novel object captioning. Please follow the following steps:

Download the pre-trained models from here to the ckpts folder.
For evaluation of oracle scores and diversity, we follow Luo et al.. In the folder GoogleConceptualCaptioning download the cider and in the cococaption folder run the download scripts,

   	./GoogleConceptualCaptioning/cococaption/get_google_word2vec_model.sh
   	./GoogleConceptualCaptioning/cococaption/get_stanford_models.sh
   	python ./scripts/eval.py

For diversity evaluation create the required numpy file for consensus re-ranking using,

   	python ./scripts/eval_diversity.py

For consensus re-ranking follow the steps here. To obtain the final diversity scores, follow the instructions of DiversityMetrics. Convert the numpy file to required json format and run the script evalscripts.py

To evaluate the F1 score for novel object captioning,

   	python ./scripts/eval_noc.py

Results

Oracle evaluation on the COCO dataset

	B4	B3	B2	B1	CIDEr	METEOR	ROUGE	SPICE
COS-CVAE	0.633	0.739	0.842	0.942	1.893	0.450	0.770	0.339

Diversity evaluation on the COCO dataset

	Unique	Novel	mBLEU	Div-1	Div-2
COS-CVAE	96.3	4404	0.53	0.39	0.57

F1-score evaluation on the held-out COCO dataset

	bottle	bus	couch	microwave	pizza	racket	suitcase	zebra	average
COS-CVAE	35.4	83.6	53.8	63.2	86.7	69.5	46.1	81.7	65.0

Bibtex

@inproceedings{coscvae20neurips,
  title     = {Diverse Image Captioning with Context-Object Split Latent Spaces},
  author    = {Mahajan, Shweta and Roth, Stefan},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2020}
}

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Related tags

Overview

Diverse Image Captioning with Context-Object Split Latent Spaces

Requirements

Preprocessed data

Download instructions

Verify Folder Structure after Download

Training

Memory requirements

Pre-trained models and evaluation

Results

Oracle evaluation on the COCO dataset

Diversity evaluation on the COCO dataset

F1-score evaluation on the held-out COCO dataset

Bibtex

Owner

Visual Inference Lab @TU Darmstadt

Implementation of UNET architecture for Image Segmentation.

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

EPSANet：An Efficient Pyramid Split Attention Block on Convolutional Neural Network

The Turing Change Point Detection Benchmark: An Extensive Benchmark Evaluation of Change Point Detection Algorithms on real-world data

Region-aware Contrastive Learning for Semantic Segmentation, ICCV 2021

Puzzle-CAM: Improved localization via matching partial and full features.

PyTorch implementation for NED. It can be used to manipulate the facial emotions of actors in videos based on emotion labels or reference styles.

Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods”

This repository contains the re-implementation of our paper deSpeckNet: Generalizing Deep Learning Based SAR Image Despeckling

Implementation of Nyström Self-attention, from the paper Nyströmformer

Single/multi view image(s) to voxel reconstruction using a recurrent neural network

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Pytorch implementation of the unsupervised object discovery method LOST.

Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields"

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

PyTorch implementation of CVPR'18 - Perturbative Neural Networks

a delightful machine learning tool that allows you to train, test and use models without writing code

Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization