This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Related tags

Deep Learningqb-norm
Overview

This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

Usage example

python dynamic_inverted_softmax.py --sims_train_test_path msrvtt/tt-ce-train-captions-test-videos-seed0.pkl --sims_test_path msrvtt/tt-ce-test-captions-test-videos-seed0.pkl --test_query_masks_path msrvtt/tt-ce-test-query_masks.pkl

To test QB-Norm on your own data you need to:

  1. Extract the similarity matrix between the caption from the training split and the videos from the testing split path/to/sims/train/test
  2. Extract testing split similarity matrix (similarities between testing captions and testing video) path/to/sims/test
  3. Run QB-Norm
python dynamic_inverted_softmax.py --sims_train_test_path path/to/sims/train/test --sims_test_path path/to/sims/test

Data

The similarity matrices for each method were extracted using the official repositories as follows: CE+, TT-CE+, CLIP2Video, CLIP4Clip (for CLIP4Clip we used the official repo to train from scratch new models since they do not provide pre-trained weights), CLIP, MMT, Audio-Retrieval.

You can download the extracted similarity matrices for training and testing here: MSRVTT, MSVD, DiDeMo, LSMDC.

Text-Video retrieval results

QB-Norm Results on MSRVTT Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CE+ Full t2v 14.4(0.1) 37.4(0.1) 50.2(0.1) 10.0(0.0) 30.0(0.1)
CE+ (+QB-Norm) Full t2v 16.4(0.0) 40.3(0.1) 52.9(0.1) 9.0(0.0) 32.7(0.1)
TT-CE+ Full t2v 14.9(0.1) 38.3(0.1) 51.5(0.1) 10.0(0.0) 30.9(0.1)
TT-CE+ (+QB-Norm) Full t2v 17.3(0.0) 42.1(0.2) 54.9(0.1) 8.0(0.0) 34.2(0.1)

QB-Norm Results on MSVD Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 25.4(0.3) 56.9(0.4) 71.3(0.2) 4.0(0.0) 46.9(0.3)
TT-CE+ (+QB-Norm) Full t2v 26.6(1.0) 58.6(1.3) 71.8(1.1) 4.0(0.0) 48.2(1.2)
CLIP2Video Full t2v 47.0 76.8 85.9 2.0 67.7
CLIP2Video (+QB-Norm) Full t2v 48.0 77.9 86.2 2.0 68.5

QB-Norm Results on DiDeMo Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 21.6(0.7) 48.6(0.4) 62.9(0.6) 6.0(0.0) 40.4(0.4)
TT-CE+ (+QB-Norm) Full t2v 24.2(0.7) 50.8(0.7) 64.4(0.1) 5.3(0.5) 43.0(0.2)
CLIP4Clip Full t2v 43.0 70.5 80.0 2.0 62.4
CLIP4Clip (+QB-Norm) Full t2v 43.5 71.4 80.9 2.0 63.1

QB-Norm Results on LSMDC Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 17.2(0.4) 36.5(0.6) 46.3(0.3) 13.7(0.5) 30.7(0.3)
TT-CE+ (+QB-Norm) Full t2v 17.8(0.4) 37.7(0.5) 47.6(0.6) 12.7(0.5) 31.7(0.3)
CLIP4Clip Full t2v 21.3 40.0 49.5 11.0 34.8
CLIP4Clip (+QB-Norm) Full t2v 22.4 40.1 49.5 11.0 35.4

QB-Norm Results on VaTeX Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
TT-CE+ Full t2v 53.2(0.2) 87.4(0.1) 93.3(0.0) 1.0(0.0) 75.7(0.1)
TT-CE+ (+QB-Norm) Full t2v 54.8(0.1) 88.2(0.1) 93.8(0.1) 1.0(0.0) 76.8(0.0)
CLIP2Video Full t2v 57.4 87.9 93.6 1.0 77.9
CLIP2Video (+QB-Norm) Full t2v 58.8 88.3 93.8 1.0 78.7

QB-Norm Results on QuerYD Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CE+ Full t2v 13.2(2.0) 37.1(2.9) 50.5(1.9) 10.3(1.2) 29.1(2.2)
CE+ (+QB-Norm) Full t2v 14.1(1.8) 38.6(1.3) 51.1(1.6) 10.0(0.8) 30.2(1.7)
TT-CE+ Full t2v 14.4(0.5) 37.7(1.7) 50.9(1.6) 9.8(1.0) 30.3(0.9)
TT-CE+ (+QB-Norm) Full t2v 15.1(1.6) 38.3(2.4) 51.2(2.8) 10.3(1.7) 30.9(2.3)

Text-Image retrieval results

QB-Norm Results on MSCoCo Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
CLIP 5k t2i 30.3 56.1 67.1 4.0 48.5
CLIP (+QB-Norm) 5k t2i 34.8 59.9 70.4 3.0 52.8
MMT-Oscar 5k t2i 52.2 80.2 88.0 1.0 71.7
MMT-Oscar (+QB-Norm) 5k t2i 53.9 80.5 88.1 1.0 72.6

Text-Audio retrieval results

QB-Norm Results on AudioCaps Benchmark

Model Split Task [email protected] [email protected] [email protected] MdR Geom
AR-CE Full t2a 23.1(0.6) 55.1(0.7) 70.7(0.6) 4.7(0.5) 44.8(0.7)
AR-CE (+QB-Norm) Full t2a 23.9(0.2) 57.1(0.3) 71.6(0.4) 4.0(0.0) 46.0(0.3)

References

If you find this code useful or use the extracted similarity matrices, please consider citing:

@misc{bogolin2021cross,
      title={Cross Modal Retrieval with Querybank Normalisation}, 
      author={Simion-Vlad Bogolin and Ioana Croitoru and Hailin Jin and Yang Liu and Samuel Albanie},
      year={2021},
      eprint={2112.12777},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Train longer, generalize better - Big batch training This is a code repository used to generate the results appearing in "Train longer, generalize bet

Elad Hoffer 145 Sep 16, 2022
Code I use to automatically update my videos' metadata on YouTube

mCodingYouTube This repository contains the code I use to automatically update my videos' metadata on YouTube, including: titles, descriptions, tags,

James Murphy 19 Oct 07, 2022
Research on Tabular Deep Learning (Python package & papers)

Research on Tabular Deep Learning For paper implementations, see the section "Papers and projects". rtdl is a PyTorch-based package providing a user-f

Yura Gorishniy 510 Dec 30, 2022
Official code repository for the publication "Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons"

Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons This repository contains the code to repr

Computational Neuroscience, University of Bern 3 Aug 04, 2022
adversarial_multi_armed_bandit_variable_plays

Adversarial Multi-Armed Bandit with Variable Plays This code is for paper: Adversarial Online Learning with Variable Plays in the Evasion-and-Pursuit

Yiyang Wang 1 Oct 28, 2021
Compare GAN code.

Compare GAN This repository offers TensorFlow implementations for many components related to Generative Adversarial Networks: losses (such non-saturat

Google 1.8k Jan 05, 2023
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

DV Lab 115 Dec 23, 2022
(ICONIP 2020) MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image

MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image This repo contains the source code for MobileHand, real-time estimation of 3D

90 Dec 12, 2022
lightweight python wrapper for vowpal wabbit

vowpal_porpoise Lightweight python wrapper for vowpal_wabbit. Why: Scalable, blazingly fast machine learning. Install Install vowpal_wabbit. Clone and

Joseph Reisinger 163 Nov 24, 2022
An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)

GLOM - Pytorch (wip) An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding,

Phil Wang 173 Dec 14, 2022
pytorch, hand(object) detect ,yolo v5,手检测

YOLO V5 物体检测,包括手部检测。 项目介绍 手部检测 手部检测示例如下 : 视频示例: 项目配置 作者开发环境: Python 3.7 PyTorch = 1.5.1 数据集 手部检测数据集 该项目数据集采用 TV-Hand 和 COCO-Hand (COCO-Hand-Big 部分) 进

Eric.Lee 11 Dec 20, 2022
Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

AA-RMVSNet Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021) in PyTorch. paper link: arXiv | CVF Change Log Ju

Qingtian Zhu 97 Dec 30, 2022
In Search of Probeable Generalization Measures

In Search of Probeable Generalization Measures Exciting News! In Search of Probeable Generalization Measures has been accepted to the International Co

Mahdi S. Hosseini 6 Sep 11, 2022
A programming language written with python

Kaoft A programming language written with python How to use A simple Hello World: c="Hello World" c Output: "Hello World" Operators: a=12

1 Jan 24, 2022
This dlib-based facial login system

Facial-Login-System This dlib-based facial login system is a technology capable of matching a human face from a digital webcam frame capture against a

Mushahid Ali 3 Apr 23, 2022
Keras like implementation of Deep Learning architectures from scratch using numpy.

Mini-Keras Keras like implementation of Deep Learning architectures from scratch using numpy. How to contribute? The project contains implementations

MANU S PILLAI 5 Oct 10, 2021
Hand tracking demo for DIY Smart Glasses with a remote computer doing the work

CameraStream This is a demonstration that streams the image from smartglasses to a pc, does the hand recognition on the remote pc and streams the proc

Teemu Laurila 20 Oct 13, 2022
RAMA: Rapid algorithm for multicut problem

RAMA: Rapid algorithm for multicut problem Solves multicut (correlation clustering) problems orders of magnitude faster than CPU based solvers without

Paul Swoboda 60 Dec 13, 2022
Repo for our ICML21 paper Unsupervised Learning of Visual 3D Keypoints for Control

Unsupervised Learning of Visual 3D Keypoints for Control [Project Website] [Paper] Boyuan Chen1, Pieter Abbeel1, Deepak Pathak2 1UC Berkeley 2Carnegie

Boyuan Chen 34 Jul 22, 2022
SegTransVAE: Hybrid CNN - Transformer with Regularization for medical image segmentation

SegTransVAE: Hybrid CNN - Transformer with Regularization for medical image segmentation This repo is the official implementation for SegTransVAE. Seg

Nguyen Truong Hai 4 Aug 04, 2022