PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency"

Last update: Dec 08, 2022

Related tags

Overview

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency". Link to arXiv paper: https://arxiv.org/abs/2105.10026

Requirements:

This code has been tested on torch==1.7.1 and torchvision==0.8.2

Prepare Repository:

Download the PororoSV dataset and associated files from here and save it as ./data. Download GloVe embeddings (glove.840B.300D) from here. The default location of the embeddings is ./data/ (see ./dcsgan/miscc/config.py).

Training DuCo-StoryGAN:

To train DuCo-StoryGAN, first train the VideoCaptioning model on the PororoSV dataset:
python train_mart.py --data_dir
Default parameters were used to train the model used in our paper.

Next, train the generative model:
python train_gan.py --cfg ./cfg/pororo_s1_duco.yml --data_dir
If training DuCo-StoryGAN on a new dataset, make sure to train the Video Captioning model (see below) before training the GAN. The vocabulary file prepared for the video-captioning model is re-used for generating common input_ids for both models. Change location of video captioning checkpoint in config file.

Unless specified, the default output root directory for all model checkpoints is ./out/

Training Evaluation Models:

Video Captioning Model
The video captioning model trained for DuCo-StoryGAN (see above) is used for evaluation. python train_mart.py --data_dir
Hierarchical Deep Multimodal Similarity (H-DAMSM)
python train_damsm.py --cfg ./cfg/pororo_damsm.yml --data_dir
Character Classifier
python train_classifier.py --data_dir --model_name inception --save_path ./models/inception --batch_size 8 --learning_rate 1e-05

Inference from DuCo-StoryGAN:

Use the following command to infer from trained weights for DuCo-StoryGAN:
python train_gan.py --cfg ./cfg/pororo_s1_duco_eval.yml --data_dir --checkpoint --infer_dir

Download our pretrained checkpoint from here.

Evaluation:

Download the pretrained models for evaluations:
Character Classifier, Video Captioning

Use the following command to evaluate classification accuracy of generated images:
python eval_scripts/eval_classifier.py --image_path --data_dir --model_path --model_name inception --mode

Use the following command to evaluate BLEU Score of generated images:
python eval_scripts/translate.py --batch_size 50 --pred_dir --data_dir --checkpoint_file --eval_mode

Acknowledgements

The code in this repository has been adapted from the MART, StoryGAN and MirrorGAN codebases.

PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency"

Related tags

Overview

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

Requirements:

Prepare Repository:

Training DuCo-StoryGAN:

Training Evaluation Models:

Inference from DuCo-StoryGAN:

Evaluation:

Acknowledgements

Owner

Adyasha Maharana

Distilled coarse part of LoFTR adapted for compatibility with TensorRT and embedded divices

NCNN implementation of Real-ESRGAN. Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

CVPR 2021

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

a short visualisation script for pyvideo data

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

Python version of the amazing Reaction Mechanism Generator (RMG).

Convert scikit-learn models to PyTorch modules

Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

RefineMask (CVPR 2021)

Weakly Supervised End-to-End Learning (NeurIPS 2021)

Toolchain to build Yoshi's Island from source code

Official implementation of the paper "Lightweight Deep CNN for Natural Image Matting via Similarity Preserving Knowledge Distillation"

A font family with a great monospaced variant for programmers.

Easy genetic ancestry predictions in Python

Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

Voice Conversion Using Speech-to-Speech Neuro-Style Transfer

Music library streaming app written in Flask & VueJS