Multi-query Video Retrieval

This repository contains the code for the paper:

@misc{wang2022multiquery,
      title={Multi-query Video Retrieval}, 
      author={Zeyu Wang and Yu Wu and Karthik Narasimhan and Olga Russakovsky},
      year={2022},
      eprint={2201.03639},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Data Preparation

Download raw videos for MSR-VTT, MSVD and VATEX, and put them into data/{dataset}/raw_videos folder.
Run the script data/extract_frames.sh to extract frames from raw videos.

The resulting data folder structures like this:

├── data
    ├── msrvtt
        ├── msrvtt_train.json
        ├── msrvtt_test.json
        ├── msrvtt_test_varying_query_sample_1-20.json
        ├── raw_videos
            ├── video0.mp4
            ├── ...
        ├── extracted_frames
            ├── video0.mp4
                ├── 0.jpg
                ├── ...
            ├── ...
    ├── msvd
        ├── ...
    ├── vatex
        ├── ...

For Frozen model, download the pretrained checkpoint provided by the original authors here, and put into record/pretrained folder.

Training

Run command: python train.py -c configs/{config_path}

Evaluation

Run command: python evaluate.py -c configs/{config_path}

Acknowledgements

The structure of this repository is based on https://github.com/victoresque/pytorch-template. Some of the code are adpated from https://github.com/m-bain/frozen-in-time and https://github.com/ArrowLuo/CLIP4Clip.

Multi-query Video Retreival

Related tags

Overview

Multi-query Video Retrieval

Data Preparation

Training

Evaluation

Acknowledgements

Owner

Princeton Visual AI Lab

i3DMM: Deep Implicit 3D Morphable Model of Human Heads

Dynamic View Synthesis from Dynamic Monocular Video

Implementation of algorithms for continuous control (DDPG and NAF).

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Code for DeepCurrents: Learning Implicit Representations of Shapes with Boundaries

Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

code from "Tensor decomposition of higher-order correlations by nonlinear Hebbian plasticity"

Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters.

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

Optimized primitives for collective multi-GPU communication

Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

Minecraft agent to farm resources using reinforcement learning

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Model Zoo for AI Model Efficiency Toolkit

Like Dirt-Samples, but cleaned up

I have created this Virtual Paint Program, in this you can paint(draw) on your screen using hand gestures, created in Python-3 using OpenCV and Mediapipe library. Gestures :- Index Finger for drawing and Index+Middle Finger for changing position and objects.

Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)