Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

Last update: Dec 29, 2022

Related tags

Overview

Video Corpus Moment Retrieval with Contrastive Learning

PyTorch implementation for the paper "Video Corpus Moment Retrieval with Contrastive Learning" (SIGIR 2021, long paper): SIGIR version, ArXiv version.

The codes are modified from TVRetrieval.

Prerequisites

python 3.x with pytorch (1.7.0), torchvision, transformers, tensorboard, tqdm, h5py, easydict
cuda, cudnn

If you have Anaconda installed, the conda environment of ReLoCLNet can be built as follows (take python 3.7 as an example):

conda create --name reloclnet python=3.7
conda activate reloclnet
conda install -c anaconda cudatoolkit cudnn  # ignore this if you already have cuda installed
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch
conda install -c anaconda h5py=2.9.0
conda install -c conda-forge transformers tensorboard tqdm easydict

The conda environment of TVRetrieval also works.

Getting started

Clone this repository

$ git clone [email protected]:IsaacChanghau/ReLoCLNet.git
$ cd ReLoCLNet

Download features

For the features of TVR dataset, please download tvr_feature_release.tar.gz (link is copied from TVRetrieval#prerequisites) and extract it to the data directory:

$ tar -xf path/to/tvr_feature_release.tar.gz -C data

This link may be useful for you to directly download Google Drive files using wget. Please refer TVRetrieval#prerequisites for more details about how the features are extracted if you are interested.

Add project root to PYTHONPATH (Note that you need to do this each time you start a new session.)

$ source setup.sh

Training and Inference

TVR dataset

# train, refer `method_tvr/scripts/train.sh` and `method_tvr/config.py` more details about hyper-parameters
$ bash method_tvr/scripts/train.sh tvr video_sub_tef resnet_i3d --exp_id reloclnet
# inference
# the model directory placed in method_tvr/results/tvr-video_sub_tef-reloclnet-*
# change the MODEL_DIR_NAME as tvr-video_sub_tef-reloclnet-*
# SPLIT_NAME: [val | test]
$ bash method_tvr/scripts/inference.sh MODEL_DIR_NAME SPLIT_NAME

For more details about evaluation and submission, please refer TVRetrieval#training-and-inference.

Citation

If you feel this project helpful to your research, please cite our work.

@inproceedings{zhang2021video,
	author = {Zhang, Hao and Sun, Aixin and Jing, Wei and Nan, Guoshun and Zhen, Liangli and Zhou, Joey Tianyi and Goh, Rick Siow Mong},
	title = {Video Corpus Moment Retrieval with Contrastive Learning},
	year = {2021},
	isbn = {9781450380379},
	publisher = {Association for Computing Machinery},
	address = {New York, NY, USA},
	url = {https://doi.org/10.1145/3404835.3462874},
	doi = {10.1145/3404835.3462874},
	booktitle = {Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
	pages = {685–695},
	numpages = {11},
	location = {Virtual Event, Canada},
	series = {SIGIR '21}
}

TODO

Upload codes for ActivityNet Captions dataset

Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

Related tags

Overview

Video Corpus Moment Retrieval with Contrastive Learning

Prerequisites

Getting started

Training and Inference

Citation

TODO

Owner

ZHANG HAO

This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-grained Classification".

LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice,

Piotr - IoT firmware emulation instrumentation for training and research

Self-supervised learning on Graph Representation Learning (node-level task)

A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

HIVE: Evaluating the Human Interpretability of Visual Explanations

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

Codeflare - Scale complex AI/ML pipelines anywhere

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

This repository is the code of the paper "Sparse Spatial Transformers for Few-Shot Learning".

Relative Uncertainty Learning for Facial Expression Recognition

A simple and lightweight genetic algorithm for optimization of any machine learning model

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

Face Recognition plus identification simply and fast | Python

Research on controller area network Intrusion Detection Systems

Code for our ALiBi method for transformer language models.

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

Companion repo of the UCC 2021 paper "Predictive Auto-scaling with OpenStack Monasca"

Download & Install mods for your favorit game with a few simple clicks

Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

Related tags

Overview

Video Corpus Moment Retrieval with Contrastive Learning

Prerequisites

Getting started

Training and Inference

Citation

TODO

Owner

ZHANG HAO

This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-grained Classification".

LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice,

Piotr - IoT firmware emulation instrumentation for training and research

Self-supervised learning on Graph Representation Learning (node-level task)

A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

HIVE: Evaluating the Human Interpretability of Visual Explanations

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

Codeflare - Scale complex AI/ML pipelines anywhere

Readings for "A Unified View of Relational Deep Learning for Polypharmacy Side Effect, Combination Therapy, and Drug-Drug Interaction Prediction."

Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

This repository is the code of the paper "Sparse Spatial Transformers for Few-Shot Learning".

Relative Uncertainty Learning for Facial Expression Recognition

A simple and lightweight genetic algorithm for optimization of any machine learning model

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

Face Recognition plus identification simply and fast | Python

Research on controller area network Intrusion Detection Systems

Code for our ALiBi method for transformer language models.

AI创造营 ：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

Companion repo of the UCC 2021 paper "Predictive Auto-scaling with OpenStack Monasca"

Download & Install mods for your favorit game with a few simple clicks

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人