WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

Last update: Oct 28, 2022

Related tags

Overview

Paper "Improving image captioning with better use of captions"

@inproceedings{shi2020improving,
  title={Improving Image Captioning with Better Use of Caption},
  author={Shi, Zhan and Zhou, Xu and Qiu, Xipeng and Zhu, Xiaodan},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  pages={7454--7464},
  year={2020}
}

Requirements

python 2.7.15

torch 1.0.1

Specific conda env is shown in ezs.yml

BTW, you need to download coco-captions and cider folder in this directory for evaluation.

Data Files and Models

Files: Add files in data directory in google drive or [baidu netdisk](链接：https://pan.baidu.com/s/1ddtfdlwD65cm4JmVu6GF3w 提取码：39pa) to data directory here. See data/README for more details.

Models: Add log directory in google drive or or [baidu netdisk](链接：https://pan.baidu.com/s/1ddtfdlwD65cm4JmVu6GF3w 提取码：39pa) here.

Scripts

MLE training:

python train.py --gpus 0 --id experiment-mle

RL training

python train.py --gpus 0 --id experiment-rl --learning_rate 2e-5 --resume_from experiment-mle --resume_from_best True --self_critical_after 0 --max_epochs 60 --learning_rate_decay_start -1 --scheduled_sampling_start -1 --reduce_on_plateau

Evaluate your own model or Load trained model:

python eval.py --gpus 0 --resume_from experiment-mle

and

python eval.py --gpus 0 --resume_from experiment-rl

Acknowledgement

This code is based on Ruotian Luo's brilliant image captioning repo ruotianluo/self-critical.pytorch. We use the detected bounding boxes/categories/features provided by Bottom-Up peteanderson80/bottom-up-attention, yangxuntu/SGAE. Many thanks for their work!

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

Related tags

Overview

Paper "Improving image captioning with better use of captions"

Requirements

Data Files and Models

Scripts

Acknowledgement

Owner

Doods2 - API for detecting objects in images and video streams using Tensorflow

A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

Few-Shot Graph Learning for Molecular Property Prediction

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Some pvbatch (paraview) scripts for postprocessing OpenFOAM data

The second project in Python course on FCC

FaceAnon - Anonymize people in images and videos using yolov5-crowdhuman

Multi-Person Extreme Motion Prediction

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

An unsupervised learning framework for depth and ego-motion estimation from monocular videos

An unreferenced image captioning metric (ACL-21)

MultiTaskLearning - Multi Task Learning for 3D segmentation

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.

On Evaluation Metrics for Graph Generative Models

SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

Personal project about genus-0 meshes, spherical harmonics and a cow

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

Related tags

Overview

Paper "Improving image captioning with better use of captions"

Requirements

Data Files and Models

Scripts

Acknowledgement

Owner

Doods2 - API for detecting objects in images and video streams using Tensorflow

A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

Few-Shot Graph Learning for Molecular Property Prediction

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Some pvbatch (paraview) scripts for postprocessing OpenFOAM data

The second project in Python course on FCC

FaceAnon - Anonymize people in images and videos using yolov5-crowdhuman

Multi-Person Extreme Motion Prediction

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

An unsupervised learning framework for depth and ego-motion estimation from monocular videos

An unreferenced image captioning metric (ACL-21)

MultiTaskLearning - Multi Task Learning for 3D segmentation

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for *Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances* paper.

On Evaluation Metrics for Graph Generative Models

SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

Personal project about genus-0 meshes, spherical harmonics and a cow

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.