This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Last update: Dec 29, 2022

Related tags

Overview

This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Usage example

python dynamic_inverted_softmax.py --sims_train_test_path msrvtt/tt-ce-train-captions-test-videos-seed0.pkl --sims_test_path msrvtt/tt-ce-test-captions-test-videos-seed0.pkl --test_query_masks_path msrvtt/tt-ce-test-query_masks.pkl

To test QB-Norm on your own data you need to:

Extract the similarity matrix between the caption from the training split and the videos from the testing split path/to/sims/train/test
Extract testing split similarity matrix (similarities between testing captions and testing video) path/to/sims/test
Run QB-Norm

python dynamic_inverted_softmax.py --sims_train_test_path path/to/sims/train/test --sims_test_path path/to/sims/test

Data

The similarity matrices for each method were extracted using the official repositories as follows: CE+, TT-CE+, CLIP2Video, CLIP4Clip (for CLIP4Clip we used the official repo to train from scratch new models since they do not provide pre-trained weights), CLIP, MMT, Audio-Retrieval.

You can download the extracted similarity matrices for training and testing here: MSRVTT, MSVD, DiDeMo, LSMDC.

Text-Video retrieval results

QB-Norm Results on MSRVTT Benchmark

Model	Split	Task	[email protected]	[email protected]	[email protected]	MdR	Geom
CE+	Full	t2v	_{^14.4_(0.1)}	_{^37.4_(0.1)}	_{^50.2_(0.1)}	_{^10.0_(0.0)}	_{^30.0_(0.1)}
CE+ (+QB-Norm)	Full	t2v	_{^16.4_(0.0)}	_{^40.3_(0.1)}	_{^52.9_(0.1)}	_{^9.0_(0.0)}	_{^32.7_(0.1)}
TT-CE+	Full	t2v	_{^14.9_(0.1)}	_{^38.3_(0.1)}	_{^51.5_(0.1)}	_{^10.0_(0.0)}	_{^30.9_(0.1)}
TT-CE+ (+QB-Norm)	Full	t2v	_{^17.3_(0.0)}	_{^42.1_(0.2)}	_{^54.9_(0.1)}	_{^8.0_(0.0)}	_{^34.2_(0.1)}

QB-Norm Results on MSVD Benchmark

Model	Split	Task	[email protected]	[email protected]	[email protected]	MdR	Geom
TT-CE+	Full	t2v	_{^25.4_(0.3)}	_{^56.9_(0.4)}	_{^71.3_(0.2)}	_{^4.0_(0.0)}	_{^46.9_(0.3)}
TT-CE+ (+QB-Norm)	Full	t2v	_{^26.6_(1.0)}	_{^58.6_(1.3)}	_{^71.8_(1.1)}	_{^4.0_(0.0)}	_{^48.2_(1.2)}
CLIP2Video	Full	t2v	_^47.0	_^76.8	_^85.9	_^2.0	_^67.7
CLIP2Video (+QB-Norm)	Full	t2v	_^48.0	_^77.9	_^86.2	_^2.0	_^68.5

QB-Norm Results on DiDeMo Benchmark

Model	Split	Task	[email protected]	[email protected]	[email protected]	MdR	Geom
TT-CE+	Full	t2v	_{^21.6_(0.7)}	_{^48.6_(0.4)}	_{^62.9_(0.6)}	_{^6.0_(0.0)}	_{^40.4_(0.4)}
TT-CE+ (+QB-Norm)	Full	t2v	_{^24.2_(0.7)}	_{^50.8_(0.7)}	_{^64.4_(0.1)}	_{^5.3_(0.5)}	_{^43.0_(0.2)}
CLIP4Clip	Full	t2v	_^43.0	_^70.5	_^80.0	_^2.0	_^62.4
CLIP4Clip (+QB-Norm)	Full	t2v	_^43.5	_^71.4	_^80.9	_^2.0	_^63.1

QB-Norm Results on LSMDC Benchmark

Model	Split	Task	[email protected]	[email protected]	[email protected]	MdR	Geom
TT-CE+	Full	t2v	_{^17.2_(0.4)}	_{^36.5_(0.6)}	_{^46.3_(0.3)}	_{^13.7_(0.5)}	_{^30.7_(0.3)}
TT-CE+ (+QB-Norm)	Full	t2v	_{^17.8_(0.4)}	_{^37.7_(0.5)}	_{^47.6_(0.6)}	_{^12.7_(0.5)}	_{^31.7_(0.3)}
CLIP4Clip	Full	t2v	_^21.3	_^40.0	_^49.5	_^11.0	_^34.8
CLIP4Clip (+QB-Norm)	Full	t2v	_^22.4	_^40.1	_^49.5	_^11.0	_^35.4

QB-Norm Results on VaTeX Benchmark

Model	Split	Task	[email protected]	[email protected]	[email protected]	MdR	Geom
TT-CE+	Full	t2v	_{^53.2_(0.2)}	_{^87.4_(0.1)}	_{^93.3_(0.0)}	_{^1.0_(0.0)}	_{^75.7_(0.1)}
TT-CE+ (+QB-Norm)	Full	t2v	_{^54.8_(0.1)}	_{^88.2_(0.1)}	_{^93.8_(0.1)}	_{^1.0_(0.0)}	_{^76.8_(0.0)}
CLIP2Video	Full	t2v	_^57.4	_^87.9	_^93.6	_^1.0	_^77.9
CLIP2Video (+QB-Norm)	Full	t2v	_^58.8	_^88.3	_^93.8	_^1.0	_^78.7

QB-Norm Results on QuerYD Benchmark

Model	Split	Task	[email protected]	[email protected]	[email protected]	MdR	Geom
CE+	Full	t2v	_{^13.2_(2.0)}	_{^37.1_(2.9)}	_{^50.5_(1.9)}	_{^10.3_(1.2)}	_{^29.1_(2.2)}
CE+ (+QB-Norm)	Full	t2v	_{^14.1_(1.8)}	_{^38.6_(1.3)}	_{^51.1_(1.6)}	_{^10.0_(0.8)}	_{^30.2_(1.7)}
TT-CE+	Full	t2v	_{^14.4_(0.5)}	_{^37.7_(1.7)}	_{^50.9_(1.6)}	_{^9.8_(1.0)}	_{^30.3_(0.9)}
TT-CE+ (+QB-Norm)	Full	t2v	_{^15.1_(1.6)}	_{^38.3_(2.4)}	_{^51.2_(2.8)}	_{^10.3_(1.7)}	_{^30.9_(2.3)}

Text-Image retrieval results

QB-Norm Results on MSCoCo Benchmark

Model	Split	Task	[email protected]	[email protected]	[email protected]	MdR	Geom
CLIP	5k	t2i	_^30.3	_^56.1	_^67.1	_^4.0	_^48.5
CLIP (+QB-Norm)	5k	t2i	_^34.8	_^59.9	_^70.4	_^3.0	_^52.8
MMT-Oscar	5k	t2i	_^52.2	_^80.2	_^88.0	_^1.0	_^71.7
MMT-Oscar (+QB-Norm)	5k	t2i	_^53.9	_^80.5	_^88.1	_^1.0	_^72.6

Text-Audio retrieval results

QB-Norm Results on AudioCaps Benchmark

Model	Split	Task	[email protected]	[email protected]	[email protected]	MdR	Geom
AR-CE	Full	t2a	_{^23.1_(0.6)}	_{^55.1_(0.7)}	_{^70.7_(0.6)}	_{^4.7_(0.5)}	_{^44.8_(0.7)}
AR-CE (+QB-Norm)	Full	t2a	_{^23.9_(0.2)}	_{^57.1_(0.3)}	_{^71.6_(0.4)}	_{^4.0_(0.0)}	_{^46.0_(0.3)}

References

If you find this code useful or use the extracted similarity matrices, please consider citing:

@misc{bogolin2021cross,
      title={Cross Modal Retrieval with Querybank Normalisation}, 
      author={Simion-Vlad Bogolin and Ioana Croitoru and Hailin Jin and Yang Liu and Samuel Albanie},
      year={2021},
      eprint={2112.12777},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Related tags

Overview

Data

Text-Video retrieval results

Text-Image retrieval results

Text-Audio retrieval results

References

Owner

Python interface for the DIGIT tactile sensor

Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models

Kaggle-titanic - A tutorial for Kaggle's Titanic: Machine Learning from Disaster competition. Demonstrates basic data munging, analysis, and visualization techniques. Shows examples of supervised machine learning techniques.

PAIRED in PyTorch 🔥

SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

[CVPR 2022] Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement

Official NumPy Implementation of Deep Networks from the Principle of Rate Reduction (2021)

This is the code for HOI Transformer

AISTATS 2019: Confidence-based Graph Convolutional Networks for Semi-Supervised Learning

Implementation of Pix2Seq in PyTorch

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Anomaly detection related books, papers, videos, and toolboxes

“Robust Lightweight Facial Expression Recognition Network with Label Distribution Training”, AAAI 2021.

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Implementation of Feedback Transformer in Pytorch

Generative code template for PixelBeasts 10k NFT project.

GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)

Source codes for "Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs"

Adversarial Autoencoders

Dcf-game-infrastructure-public - Contains all the components necessary to run a DC finals (attack-defense CTF) game from OOO