LaBERT - A length-controllable and non-autoregressive image captioning model.

Overview

Length-Controllable Image Captioning (ECCV2020)

This repo provides the implemetation of the paper Length-Controllable Image Captioning.

Install

conda create --name labert python=3.7
conda activate labert

conda install pytorch=1.3.1 torchvision cudatoolkit=10.1 -c pytorch
pip install h5py tqdm transformers==2.1.1
pip install git+https://github.com/salaniz/pycocoevalcap

Data & Pre-trained Models

  • Prepare MSCOCO data follow link.
  • Download pretrained Bert and Faster-RCNN from Baidu Cloud Disk [code: 0j9f] or Google Drive.
    • It's an unified checkpoint file, containing a pretrained Bert-base and the fc6 layer of the Faster-RCNN.
  • Download our pretrained LaBERT model from Baidu Cloud Disk [code: fpke] or Google Drive.

Scripts

Train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Continue train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU \
  model_path $PATH_TO_MODEL

Inference

python inference.py \
  model_path $PATH_TO_MODEL \
  save_dir $PATH_TO_TEST_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Evaluate

python evaluate.py \
  --gt_caption data/id2captions_test.json \
  --pd_caption $PATH_TO_TEST_OUTPUT/caption_results.json \
  --save_dir $PATH_TO_TEST_OUTPUT

Cite

Please consider citing our paper in your publications if the project helps your research.

@article{deng2020length,
  title={Length-Controllable Image Captioning},
  author={Deng, Chaorui and Ding, Ning and Tan, Mingkui and Wu, Qi},
  journal={arXiv preprint arXiv:2007.09580},
  year={2020}
}
Owner
bearcatt
bearcatt
Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods”

Uncertainty Estimation Methods Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods” Reference If you use this code,

EPFL Machine Learning and Optimization Laboratory 4 Apr 05, 2022
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis This is a PyTorch implementation of the model described in our pape

qzhb 6 Jul 08, 2021
Real-time VIBE: Frame by Frame Inference of VIBE (Video Inference for Human Body Pose and Shape Estimation)

Real-time VIBE Inference VIBE frame-by-frame. Overview This is a frame-by-frame inference fork of VIBE at [https://github.com/mkocabas/VIBE]. Usage: i

23 Jul 02, 2022
Self-Supervised Learning of Event-based Optical Flow with Spiking Neural Networks

Self-Supervised Learning of Event-based Optical Flow with Spiking Neural Networks Work accepted at NeurIPS'21 [paper, video]. If you use this code in

TU Delft 43 Dec 07, 2022
Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Manifold-SCA Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning The repo is org

Yuanyuan Yuan 172 Dec 29, 2022
PyTorch reimplementation of REALM and ORQA

PyTorch reimplementation of REALM and ORQA

Li-Huai (Allan) Lin 17 Aug 20, 2022
Yolo algorithm for detection + centroid tracker to track vehicles

Vehicle Tracking using Centroid tracker Algorithm used : Yolo algorithm for detection + centroid tracker to track vehicles Backend : opencv and python

6 Dec 21, 2022
Share a benchmark that can easily apply reinforcement learning in Job-shop-scheduling

Gymjsp Gymjsp is an open source Python library, which uses the OpenAI Gym interface for easily instantiating and interacting with RL environments, and

134 Dec 08, 2022
Machine learning Bot detection technique, based on United States election dataset

Machine learning Bot detection technique, based on United States election dataset (2020). Current github repo provides implementation described in pap

Alexander Shevtsov 4 Nov 20, 2022
This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

slue-toolkit We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE

ASAPP Research 39 Sep 21, 2022
Dark Finix: All in one hacking framework with almost 100 tools

Dark Finix - Hacking Framework. Dark Finix is a all in one hacking framework wit

Md. Nur habib 2 Feb 18, 2022
[SIGGRAPH 2021 Asia] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning

DeepVecFont This is the official Pytorch implementation of the paper: Yizhi Wang and Zhouhui Lian. DeepVecFont: Synthesizing High-quality Vector Fonts

Yizhi Wang 146 Dec 18, 2022
GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily

GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily Abstract Graph Neural Networks (GNNs) are widely used on a

10 Dec 20, 2022
A Pytorch implement of paper "Anomaly detection in dynamic graphs via transformer" (TADDY).

TADDY: Anomaly detection in dynamic graphs via transformer This repo covers an reference implementation for the paper "Anomaly detection in dynamic gr

Yue Tan 21 Nov 24, 2022
Vit-ImageClassification - Pytorch ViT for Image classification on the CIFAR10 dataset

Vit-ImageClassification Introduction This project uses ViT to perform image clas

Kaicheng Yang 4 Jun 01, 2022
Little tool in python to watch anime from the terminal (the better way to watch anime)

ani-cli Script working again :), thanks to the fork by Dink4n for the alternative approach to by pass the captcha on gogoanime A cli to browse and wat

Harshith 4.5k Dec 31, 2022
Capsule endoscopy detection DACON challenge

capsule_endoscopy_detection (DACON Challenge) Overview Yolov5, Yolor, mmdetection기반의 모델을 사용 (총 11개 모델 앙상블) 모든 모델은 학습 시 Pretrained Weight을 yolov5, yolo

MAILAB 11 Nov 25, 2022
Rule based classification A hotel s customers dataset

Rule-based-classification-A-hotel-s-customers-dataset- Aim: Categorize new customers by segment and predict how much revenue they can generate This re

Şebnem 4 Jan 02, 2022
This is the face keypoint train code of project face-detection-project

face-key-point-pytorch 1. Data structure The structure of landmarks_jpg is like below: |--landmarks_jpg |----AFW |------AFW_134212_1_0.jpg |------AFW_

I‘m X 3 Nov 27, 2022
[CIKM 2021] Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning. This repo contains the PyTorch code and implementation for the paper E

Akuchi 18 Dec 22, 2022