Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Last update: Oct 11, 2022

Related tags

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Captioning for Untrimmed Videos (CVPR 2021).

Requirements

Python 3.6
Java 15.0.2
PyTorch 1.2
numpy, tqdm, h5py, scipy, six

Training & Inference

Data preparation

Download the pre-extracted video features of ActivityNet Captions or Charades Captions datasets from BaiduNetdisk (code: he21).
Decompress the downloaded files to the corresponding dataset folder in the ordered_feature/ directory.

Start training

Train our model without reinforcement learning, * can be activitynet or charades.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token/model.json ../results/*/dm.token/path.json --is_train

Fine-tune the pretrained model using self-critical with both accuracy and diversity rewards.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token.rl/model.json ../results/*/dm.token.rl/path.json --is_train --resume_file ../results/*/dm.token/model/epoch.*.th

Train our model with key frames selection.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/key_frames/model.json ../results/*/key_frames/path.json --is_train --resume_file ../results/*/key_frames/pretrained.th

It will achieve a slightly worse result with only a half of the video features used at inference phase for faster decoding. You need to download the pretrained.th model at first for the key-frame selection.

Evaluation

The trained checkpoints have been saved at the results/*/folder/model/ directory. After evaluation, the generated captions (corresponding to the name file in the public_split) and evaluating scores will be saved at results/*/folder/pred/tst/.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/folder/model.json ../results/*/folder/path.json --eval_set tst --resume_file ../results/*/folder/model/epoch.*.th

We also provide the pretrained models for the ActivityNet dataset here and Charades dataset here, which are re-run and achieve similar results with the paper.

Reference

If you find this repo helpful, please consider citing:

@inproceedings{song2021paragraph,
  title={Towards Diverse Paragraph Captioning for Untrimmed Videos},
  author={Song, Yuqing and Chen, Shizhe and Jin, Qin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Related tags

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

Requirements

Training & Inference

Data preparation

Start training

Evaluation

Reference

Owner

Yuqing Song

Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

Vehicle Detection Using Deep Learning and YOLO Algorithm

Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

Statistical and Algorithmic Investing Strategies for Everyone

Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

Unofficial implementation of the Involution operation from CVPR 2021

A simple, unofficial implementation of MAE using pytorch-lightning

基于PaddleClas实现垃圾分类，并转换为inference格式用PaddleHub服务端部署

This library is a location of the LegacyLogger for PyTorch Lightning.

A C implementation for creating 2D voronoi diagrams

Code for "Primitive Representation Learning for Scene Text Recognition" (CVPR 2021)

Code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

custom pytorch implementation of MoCo v3

A general framework for deep learning experiments under PyTorch based on pytorch-lightning

Code for CMaskTrack R-CNN (proposed in Occluded Video Instance Segmentation)