Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Last update: Dec 14, 2022

Related tags

Deep Learning vln-bert

Overview

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, and Dhruv Batra

Paper: https://arxiv.org/abs/2004.14973

Model Zoo

A variety of pre-trained VLN-BERT weights can accessed through the following links:

	Pre-training Stages	Job ID	Val Unseen SR	URL
0	no pre-training	174631	30.52%	TBD
1	1	175134	45.17%	TBD
3	1 and 2	221943	49.64%	download
2	1 and 3	220929	50.02%	download
4	1, 2, and 3 (Full Model)	220825	59.26%	download

Usage Instructions

Follow the instructions in INSTALL.md to setup this codebase. The instructions walk you through several steps including preprocessing the Matterport3D panoramas by extracting regions with a pretrained object detector.

Training

To preform stage 3 of pre-training, first download ViLBERT weights from here. Then, run:

python \
-m torch.distributed.launch \
--nproc_per_node=8 \
--nnodes=1 \
--node_rank=0 \
train.py \
--from_pretrained <path/to/vilbert_pytorch_model_9.bin> \
--save_name [pre_train_run_id] \
--num_epochs 50 \
--warmup_proportion 0.08 \
--cooldown_factor 8 \
--masked_language \
--masked_vision \
--no_ranking

To fine-tune VLN-BERT for the path selection task, run:

python \
-m torch.distributed.launch \
--nproc_per_node=8 \
--nnodes=1 \
--node_rank=0 \
train.py \
--from_pretrained <path/to/pytorch_model_50.bin> \
--save_name [fine_tune_run_id]

Evaluation

To evaluate a pre-trained model, run:

python test.py \
--split [val_seen|val_unseen] \
--from_pretrained <path/to/run_[run_id]_pytorch_model.bin> \
--save_name [run_id]

followed by:

python scripts/calculate-metrics.py <path/to/results_[val_seen|val_unseen].json>

Citation

If you find this code useful, please consider citing:

@inproceedings{majumdar2020improving,
  title={Improving Vision-and-Language Navigation with Image-Text Pairs from the Web},
  author={Arjun Majumdar and Ayush Shrivastava and Stefan Lee and Peter Anderson and Devi Parikh and Dhruv Batra},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Related tags

Overview

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

Model Zoo

Usage Instructions

Training

Evaluation

Citation

Owner

Arjun Majumdar

1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.

Minimalistic PyTorch training loop

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Implementation of Artificial Neural Network Algorithm

The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

(ICCV 2021) PyTorch implementation of Paper "Progressive Correspondence Pruning by Consensus Learning"

A toolkit for developing and comparing reinforcement learning algorithms.

🎓Automatically Update CV Papers Daily using Github Actions (Update at 12:00 UTC Every Day)

A curated list of awesome resources combining Transformers with Neural Architecture Search

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Contrastive Feature Loss for Image Prediction

COCO Style Dataset Generator GUI

Complete* list of autonomous driving related datasets

Embeds a story into a music playlist by sorting the playlist so that the order of the music follows a narrative arc.

The repository includes the code for training cell counting applications. (Keras + Tensorflow)

TAPEX: Table Pre-training via Learning a Neural SQL Executor

The code release of paper Low-Light Image Enhancement with Normalizing Flow

[CVPR'2020] DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data

A simple tutoral for error correction task, based on Pytorch