Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Last update: Dec 04, 2022

Related tags

Overview

SMCG

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Introduction

We investigate a novel and challenging task, namely controllable video captioning with an exemplar sentence. Formally, given a video and a syntactically valid exemplar sentence, the task aims to generate one caption which not only describes the semantic contents of the video, but also follows the syntactic form of the given exemplar sentence. In order to tackle such an exemplar-based video captioning task, we propose a novel Syntax Modulated Caption Generator (SMCG) incorporated in an encoder-decoder-reconstructor architecture.

Dependency

python 2.7.2
torch 1.1.0
java openjdk version "10.0.2" 2018-07-17
StanfordCoreNLP

Download Features and Preprocess Data

For the MSRVTT dataset, please download the following files into the './msrvtt/msrvtt_data/' folder:

MSRVTT caption info: videodatainfo_2016.json,
MSRVTT captions and their sentence parse trees: msrvtt_all_sentence_parse_dict.pkl,
Collected exemplar sentences and their parse trees: coco_filter_parse_dict.pkl,
Video features: msrvtt_incepRes_rgb_feats.hdf5,
Glove word embeddings: glove.840B.300d.zip.

For the ActivityNet Captionsd dataset, please download the following files into the './activitynet/activitynet_data/' folder:

ActivityNet caption info: CAP.pkl,
ActivityNet captions and their sentence parse trees: anet_parse_dict.pkl,
Collected exemplar sentences and their parse trees: coco_filter_parse_dict.pkl,
Video features: anet_new_inception_resnet_feats.hdf5,
Glove word embeddings: glove.840B.300d.zip.

Data Preprocessing

Go to the './msrvtt/process_msrvtt_data/' folder, and run:

python prepro_vocab_parse_pos.py
python fill_template.py

Go to the './activitynet/process_activitynet_data/' folder, and run:

python prepro_anetcoco_vocab_pos_parse.py
python fill_template.py

Model Training and Testing

For the MSRVTT dataset, please go to the './msrvtt/src/' folder, and train the model by:

python train.py --gpu xx

For model inference and evaluation, run:

bash eval.sh 
bash control.sh

Note: 'eval.sh' is used to evaluate the generated exemplar-based captions with conventional captioning metrics. 'control.sh' is used to compare the generated exemplar-based captions with the provided exemplar captions from the syntactic aspect, i.e., compute the edit distance between their parse trees.
For the ActivityNet Captions dataset, please go to the './activitynet/src/' folder, and train/test the model as on the MSRVTT dataset.

Citation

@inproceedings{yuan2020Control,
  title={Controllable Video Captioning with an Exemplar Sentence},
  author={Yuan, Yitian and Ma, Lin and Wang, Jingwen and Zhu, Wenwu},
  booktitle={the 28th ACM International Conference on Multimedia (MM ’20)},
  year={2020}
}

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Related tags

Overview

SMCG

Introduction

Dependency

Download Features and Preprocess Data

Data Preprocessing

Model Training and Testing

Citation

Owner

Benchmark for Answering Existential First Order Queries with Single Free Variable

Artificial Intelligence playing minesweeper 🤖

Code for Massive-scale Decoding for Text Generation using Lattices

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

This Deep Learning Model Predicts that from which disease you are suffering.

A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

A2LP for short, ECCV2020 spotlight, Investigating SSL principles for UDA problems

We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will make a program to Crack Any Password Using Python. Show some ❤️ by starring this repository!

Julia package for multiway (inverse) covariance estimation.

Systematic generalisation with group invariant predictions

PyTorch implementation of paper "StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement" (ICCV 2021 Oral)

KDD CUP 2020 Automatic Graph Representation Learning: 1st Place Solution

DropNAS: Grouped Operation Dropout for Differentiable Architecture Search

Code for HodgeNet: Learning Spectral Geometry on Triangle Meshes, in SIGGRAPH 2021.

Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

Official code for NeurIPS 2021 paper "Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN"

Production First and Production Ready End-to-End Speech Recognition Toolkit

This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"

YOLOv5 detection interface - PyQt5 implementation

Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)