PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)

Last update: Dec 22, 2022

Related tags

Overview

D-VQA

We provide the PyTorch implementation for Debiased Visual Question Answering from Feature and Sample Perspectives (NeurIPS 2021).

Dependencies

Python 3.6
PyTorch 1.1.0
dependencies in requirements.txt
We train and evaluate all of the models based on one TITAN Xp GPU

Getting Started

Installation

Clone this repository:

 git clone https://github.com/Zhiquan-Wen/D-VQA.git
 cd D-VQA

Install PyTorch and other dependencies:
```
 pip install -r requirements.txt
```

Download and preprocess the data

cd data 
bash download.sh
python preprocess_features.py --input_tsv_folder xxx.tsv --output_h5 xxx.h5
python feature_preprocess.py --input_h5 xxx.h5 --output_path trainval 
python create_dictionary.py --dataroot vqacp2/
python preprocess_text.py --dataroot vqacp2/ --version v2
cd ..

Training

Train our model

CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --output saved_models_cp2/ --self_loss_weight 3 --self_loss_q 0.7

Train the model with 80% of the original training set

CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --output saved_models_cp2/ --self_loss_weight 3 --self_loss_q 0.7 --ratio 0.8

Evaluation

A json file of results from the test set can be produced with:

CUDA_VISIBLE_DEVICES=0 python test.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --checkpoint_path saved_models_cp2/best_model.pth --output saved_models_cp2/result/

Compute detailed accuracy for each answer type:

python comput_score.py --input saved_models_cp2/result/XX.json --dataroot data/vqacp2/

Pretrained model

A well-trained model can be found here. The test results file produced by it can be found here and its performance is as follows:

Overall score: 61.91
Yes/No: 88.93 Num: 52.32 other: 50.39

Reference

If you found this code is useful, please cite the following paper:

@inproceedings{D-VQA,
  title     = {Debiased Visual Question Answering from Feature and Sample Perspectives},
  author    = {Zhiquan Wen, 
               Guanghui Xu, 
               Mingkui Tan, 
               Qingyao Wu, 
               Qi Wu},
  booktitle = {NeurIPS},
  year = {2021}
}

Acknowledgements

This repository contains code modified from SSL-VQA, thank you very much!

Besides, we thank Yaofo Chen for providing MIO library to accelerate the data loading.

Comments

Questions about the code
Thank you very much for providing the code, but I still have two questions that I did not understand well.

A module, BDM, is used to capture negative bias, but this module only includes a multi-layer perceptron. Then how to ensure the features captured by this multi-layer perceptron are negative bias?

On the left of Figure 2 of the paper, there are no backward gradient of the question-to-answer and the vision-to-answer branches. Where did it reflect in the code?
opened by darwann 4
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
LXMERT numbers

Hi, I wish to reproduce the LXMERT(LXMERT without D-VQA) numbers reported in the paper. It would be helpful if you could provide me with a way to do this using your code. I tried using the original LXMERT code, but I am not able to get the numbers reported in your paper on the VQA-CP2 dataset.

opened by Vaidehi99 0
Download trainval_36.zip error

Hi, thank you for your work on this.

I keep getting a download error when downloading the trainval_36.zip file. Is there another link I can use to download this?

Thanks in advance!

opened by chojw 0
关于box和image的对齐问题

您好，我将box的注释解开后，重新生成特征，然后将其绘制出来，但是明显感觉有偏差，不知道您是否可以提供一份绘图的代码。下面是我的代码 def plot_rect(image, boxes): img = Image.fromarray(np.uint8(image)) draw = ImageDraw.Draw(img) for k in range(2): box = boxes[k,:] print(box) drawrect(draw, box, outline='green', width=3) img = np.asarray(img) return img def drawrect(drawcontext, xy, outline=None, width=0): x1, y1, x2, y2 = xy points = (x1, y1), (x2, y1), (x2, y2), (x1, y2), (x1, y1) drawcontext.line(points, fill=outline, width=width)

opened by LemonQC 0

Releases(Results_LXMERT)

Results_LXMERT(Oct 26, 2021)

Source code(tar.gz)
Source code(zip)
test_best_epoch0.json(9.66 MB)
Models_LXMERT(Oct 26, 2021)

Source code(tar.gz)
Source code(zip)
BEST.pth(842.49 MB)
LXMERT_pretrained_model(Oct 25, 2021)

Source code(tar.gz)
Source code(zip)
Epoch20_LXRT.pth(870.06 MB)
Results(Oct 10, 2021)

Source code(tar.gz)
Source code(zip)
61.91_results.json(9.65 MB)
Models(Oct 10, 2021)

Source code(tar.gz)
Source code(zip)
61.91.pth(620.44 MB)

Owner

Zhiquan Wen

GitHub Repository

The aim of this project is to build an AI bot that can play the Wordle game, or more generally Squabble

Wordle RL The aim of this project is to build an AI bot that can play the Wordle game, or more generally Squabble I know there are more deterministic

3 Feb 22, 2022

NeuroGen: activation optimized image synthesis for discovery neuroscience

NeuroGen: activation optimized image synthesis for discovery neuroscience NeuroGen is a framework for synthesizing images that control brain activatio

3 Aug 17, 2022

moving object detection for satellite videos.

DSFNet: Dynamic and Static Fusion Network for Moving Object Detection in Satellite Videos Algorithm Introduction DSFNet: Dynamic and Static Fusion Net

39 Dec 16, 2022

Parameter-ensemble-differential-evolution - Shows how to do parameter ensembling using differential evolution.

Ensembling parameters with differential evolution This repository shows how to ensemble parameters of two trained neural networks using differential e

9 May 04, 2022

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

94 Nov 21, 2022

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition Usage First, install PyTorch 1.7.1+, torchvision 0.8.2

40 Dec 12, 2022

A High-Performance Distributed Library for Large-Scale Bundle Adjustment

MegBA: A High-Performance and Distributed Library for Large-Scale Bundle Adjustment This repo contains an official implementation of MegBA. MegBA is a

336 Dec 27, 2022

MISSFormer: An Effective Medical Image Segmentation Transformer

MISSFormer Code for paper "MISSFormer: An Effective Medical Image Segmentation Transformer". Please read our preprint at the following link: paper_add

22 Dec 24, 2022

Implementation of several Bayesian multi-target tracking algorithms, including Poisson multi-Bernoulli mixture filters for sets of targets and sets of trajectories. The repository also includes the GOSPA metric and a metric for sets of trajectories to evaluate performance.

This repository contains the Matlab implementations for the following multi-target filtering/tracking algorithms: - Folder PMBM contains the implemen

62 Dec 12, 2022

This repository provides the official code for GeNER (an automated dataset Generation framework for NER).

GeNER This repository provides the official code for GeNER (an automated dataset Generation framework for NER). Overview of GeNER GeNER allows you to

50 Nov 30, 2022

InsCLR: Improving Instance Retrieval with Self-Supervision

InsCLR: Improving Instance Retrieval with Self-Supervision This is an official PyTorch implementation of the InsCLR paper. Download Dataset Dataset Im

25 Aug 30, 2022

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

449 Dec 27, 2022

This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

Mutli-agent task allocation This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams. To change

5 Oct 12, 2022

This repository contains code released by Google Research.

26.6k Dec 31, 2022

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction Code for the ECCV 2020 paper by Yiming Qian and Yasutaka Furukawa Getting

37 Dec 04, 2022

Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv MMCoref_cleaned Code for the MMCoref task of the SIMMC 2.0 dataset. Pre

2 Dec 05, 2022

Self-Learned Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence

In this paper, we address the problem of rain streaks removal in video by developing a self-learned rain streak removal method, which does not require any clean groundtruth images in the training pro

44 Dec 06, 2022

Libraries, tools and tasks created and used at DeepMind Robotics.

270 Nov 30, 2022

Companion repository to the paper accepted at the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities

Transfer learning approach to bicycle sharing systems station location planning using OpenStreetMap Companion repository to the paper accepted at the

4 Oct 24, 2022

Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

Single Image Super-Resolution with EDSR, WDSR and SRGAN A Tensorflow 2.x based implementation of Enhanced Deep Residual Networks for Single Image Supe

1.3k Jan 06, 2023