Official code repository for the EMNLP 2021 paper

Last update: Dec 19, 2022

Related tags

Overview

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

PyTorch code for the EMNLP 2021 paper "Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization". See the arxiv paper here.

Requirements:

This code has been tested on torch==1.11.0.dev20211014 (nightly) and torchvision==0.12.0.dev20211014 (nightly)

Prepare Repository:

Download the PororoSV dataset and associated files from here and save it as ./data. Download GloVe embeddings (glove.840B.300D) from here. The default location of the embeddings is ./data/ (see ./dcsgan/miscc/config.py).

Extract Constituency Parses:

To install the Berkeley Neural Parser with SpaCy:

pip install benepar

To extract parses for PororoSV:

python parse.py --dataset pororo --data_dir <path-to-data-directory>

Extract Dense Captions:

We use the Dense Captioning Model implementation available here. Download the pretrained model as outlined in their repository. To extract dense captions for PororoSV:
python describe_pororosv.py --config_json <path-to-config> --lut_path <path-to-VG-regions-dict-lite.pkl> --model_checkpoint <path-to-model-checkpoint> --img_path <path-to-data-directory> --box_per_img 10 --batch_size 1

Training VLC-StoryGAN:

To train VLC-StoryGAN for PororoSV:
python train_gan.py --cfg ./cfg/pororo_s1_vlc.yml --data_dir <path-to-data-directory> --dataset pororo\

Unless specified, the default output root directory for all model checkpoints is ./out/

Evaluation Models:

Please see here for evaluation models for character classification-based scores, BLEU2/3 and R-Precision.

To evaluate Frechet Inception Distance (FID):
python eval_vfid --img_ref_dir <path-to-image-directory-original images> --img_gen_dir <path-to-image-directory-generated-images> --mode <mode>

More details coming soon.

Citation:

@inproceedings{maharana2021integrating,
  title={Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization},
  author={Maharana, Adyasha and Bansal, Mohit},
  booktitle={EMNLP},
  year={2021}
}

Official code repository for the EMNLP 2021 paper

Related tags

Overview

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

Requirements:

Prepare Repository:

Extract Constituency Parses:

Extract Dense Captions:

Training VLC-StoryGAN:

Evaluation Models:

Citation:

Owner

Adyasha Maharana

A configurable, tunable, and reproducible library for CTR prediction

Machine Learning University: Accelerated Computer Vision Class

CATE: Computation-aware Neural Architecture Encoding with Transformers

📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Get started learning C# with C# notebooks powered by .NET Interactive and VS Code.

Unsupervised Foreground Extraction via Deep Region Competition

We utilize deep reinforcement learning to obtain favorable trajectories for visual-inertial system calibration.

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

A simple algorithm for extracting tree height in sparse scene from point cloud data.

The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines.

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]

An educational AI robot based on NVIDIA Jetson Nano.

QilingLab challenge writeup

Main Results on ImageNet with Pretrained Models

Repository for "Exploring Sparsity in Image Super-Resolution for Efficient Inference", CVPR 2021

Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

My usage of Real-ESRGAN to upscale anime, some test and results in the test_img folder

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Bald-to-Hairy Translation Using CycleGAN

Spectrum is an AI that uses machine learning to generate Rap song lyrics