PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

License: MIT

This is the PyTorch implementation of our paper:

Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
European Conference on Computer Vision (ECCV), 2020

[arXiv] [GitHub] [Project]

10-min YouTube Video

How to start

Clone the repo recursively:

git clone --recursive [email protected]:chihyaoma/cyclical-visual-captioning.git

If you didn't clone with the --recursive flag, then you'll need to manually clone the pybind submodule from the top-level directory:

git submodule update --init --recursive

Installation

The proposed cyclical method can be applied directly to image and video captioning tasks.

Currently, installation guide and our code for video captioning on the ActivityNet-Entities dataset are provided in anet-video-captioning.

Acknowledgments

Chih-Yao Ma and Zsolt Kira were partly supported by DARPA’s Lifelong Learning Machines (L2M) program, under Cooperative Agreement HR0011-18-2-0019, as part of their affiliation with Georgia Tech. We thank Chia-Jung Hsu for her valuable and artistic helps on the figures.

Citation

If you find this repository useful, please cite our paper:

@inproceedings{ma2020learning,
    title={Learning to Generate Grounded Image Captions without Localization Supervision},
    author={Ma, Chih-Yao and Kalantidis, Yannis and AlRegib, Ghassan and Vajda, Peter and Rohrbach, Marcus and Kira, Zsolt},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    year={2020},
    url={https://arxiv.org/abs/1906.00283},
}
Owner
Chih-Yao Ma
Research Scientist at Facebook #ComputerVision #DeepLearning
Chih-Yao Ma
Face recognition with trained classifiers for detecting objects using OpenCV

Face_Detector Face recognition with trained classifiers for detecting objects using OpenCV Libraries required to be installed using pip Command: cv2 n

Chumui Tripura 0 Oct 31, 2021
Rest API Written In Python To Classify NSFW Images.

Rest API Written In Python To Classify NSFW Images.

Wahyusaputra 2 Dec 23, 2021
A Python implementation of active inference for Markov Decision Processes

A Python package for simulating Active Inference agents in Markov Decision Process environments. Please see our companion preprint on arxiv for an ove

235 Dec 21, 2022
Videocaptioning.pytorch - A simple implementation of video captioning

pytorch implementation of video captioning recommend installing pytorch and pyth

Yiyu Wang 2 Jan 01, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
joint detection and semantic segmentation, based on ultralytics/yolov5,

Multi YOLO V5——Detection and Semantic Segmentation Overeview This is my undergraduate graduation project which based on ultralytics YOLO V5 tag v5.0.

477 Jan 06, 2023
Implementation of our recent paper, WOOD: Wasserstein-based Out-of-Distribution Detection.

WOOD Implementation of our recent paper, WOOD: Wasserstein-based Out-of-Distribution Detection. Abstract The training and test data for deep-neural-ne

8 Dec 24, 2022
Rule based classification A hotel s customers dataset

Rule-based-classification-A-hotel-s-customers-dataset- Aim: Categorize new customers by segment and predict how much revenue they can generate This re

Şebnem 4 Jan 02, 2022
Using contrastive learning and OpenAI's CLIP to find good embeddings for images with lossy transformations

The official code for the paper "Inverse Problems Leveraging Pre-trained Contrastive Representations" (to appear in NeurIPS 2021).

Sriram Ravula 26 Dec 10, 2022
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 07, 2023
Hi Guys, here I am providing examples, which will help you in Lerarning Python

LearningPython Hi guys, here I am trying to include as many practice examples of Python Language, as i Myself learn, and hope these will help you in t

4 Feb 03, 2022
Self-attentive task GAN for space domain awareness data augmentation.

SATGAN TODO: update the article URL once published. Article about this implemention The self-attentive task generative adversarial network (SATGAN) le

Nathan 2 Mar 24, 2022
Tensorflow port of a full NetVLAD network

netvlad_tf The main intention of this repo is deployment of a full NetVLAD network, which was originally implemented in Matlab, in Python. We provide

Robotics and Perception Group 225 Nov 08, 2022
A high-level Python library for Quantum Natural Language Processing

lambeq About lambeq is a toolkit for quantum natural language processing (QNLP). Documentation: https://cqcl.github.io/lambeq/ User support: lambeq-su

Cambridge Quantum 315 Jan 01, 2023
Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data

VIMuRe Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data. If you use this code please cite this article (preprint). De

6 Dec 15, 2022
Behavioral "black-box" testing for recommender systems

RecList RecList Free software: MIT license Documentation: https://reclist.readthedocs.io. Overview RecList is an open source library providing behavio

Jacopo Tagliabue 375 Dec 30, 2022
[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Contents Cycle-In-Cycle GANs Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Acknowledgments Relat

Hao Tang 67 Dec 14, 2022
Offical implementation for "Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation".

Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation (NeurIPS 2021) by Qiming Hu, Xiaojie Guo. Dependencies P

Qiming Hu 31 Dec 20, 2022
ADOP: Approximate Differentiable One-Pixel Point Rendering

ADOP: Approximate Differentiable One-Pixel Point Rendering Abstract: We present a novel point-based, differentiable neural rendering pipeline for scen

Darius Rückert 1.9k Jan 06, 2023
Automate issue discovery for your projects against Lightning nightly and releases.

Automated Testing for Lightning EcoSystem Projects Automate issue discovery for your projects against Lightning nightly and releases. You get CPUs, Mu

Pytorch Lightning 41 Dec 24, 2022