MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Last update: Jan 29, 2022

Related tags

Overview

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Overview

We release the code of the MVFNet (Multi-View Fusion Network). The core code to implement the Multi-View Fusion Module is codes/models/modules/MVF.py.

[Mar 24, 2021] We has released the code of MVFNet.

[Dec 20, 2020] MVFNet has been accepted by AAAI 2021.

Prerequisites
Data Preparation
Model Zoo
Testing
Training

Prerequisites

All dependencies can be installed using pip:

python -m pip install -r requirements.txt

Our experiments run on Python 3.7 and PyTorch 1.5. Other versions should work but are not tested.

Download Pretrained Models

Download ImageNet pre-trained models

cd pretrained
sh download_imgnet.sh

Download K400 pre-trained models

Please refer to Model Zoo.

Data Preparation

Please refer to DATASETS.md for data preparation.

Model Zoo

Architecture	Dataset	T x interval	Top-1 Acc.	Pre-trained model	Train log	Test log
MVFNet-ResNet50	Kinetics-400	4x16	74.2%	Download link	Log link	Log link
MVFNet-ResNet50	Kinetics-400	8x8	76.0%	Download link	Miss	Log link
MVFNet-ResNet50	Kinetics-400	16x4	77.0%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	4x16	76.0%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	8x8	77.4%	Download link	Log link	Log link
MVFNet-ResNet101	Kinetics-400	16x4	78.4%	Download link	Log link	Log link

Testing

For 3 crops, 10 clips, the processing of testing

# Dataset: Kinetics-400
# Architecture: R50_8x8 [email protected]=76.0%
bash scripts/dist_test_recognizer.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py ckpt_path 8 --fcn_testing

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

For example, to train MVFNet-ResNet50 on Kinetics400 with 8 gpus, you can run:

bash scripts/dist_train_recognizer.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py 8

We also provide the script to train MVFNet on Kinetics400 with multiple machines (e.g., 2 machines and 16 GPUs).

# For first machine, --master_addr is the ip of your first machine
bash scripts/dist_train_multinode_1.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py 8

# For second machine, --master_addr is still the ip of your first machine
bash scripts/dist_train_multinode_2.sh configs/MVFNet/K400/mvf_kinetics400_2d_rgb_r50_dense.py 8

Acknowledgements

We especially thank the contributors of the mmaction codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@inproceedings{wu2020MVFNet,
  author    = {Wu, Wenhao and He, Dongliang and Lin, Tianwei and Li, Fu and Gan, Chuang and Ding, Errui},
  title     = {MVFNet: Multi-View Fusion Network for Efficient Video Recognition},
  booktitle = {AAAI},
  year      = {2021}
}

Contact

For any question, please file an issue or contact

Wenhao Wu: [email protected]

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Related tags

Overview

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Overview

Prerequisites

Download Pretrained Models

Data Preparation

Model Zoo

Testing

Training

Acknowledgements

License

Citation

Contact

Owner

Single-Stage Instance Shadow Detection with Bidirectional Relation Learning (CVPR 2021 Oral)

Hierarchical Metadata-Aware Document Categorization under Weak Supervision (WSDM'21)

On the model-based stochastic value gradient for continuous reinforcement learning

利用Tensorflow实现基于CNN的中文短文本分类

《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

TransGAN: Two Transformers Can Make One Strong GAN

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes (CVPR 2021 Oral)

Exploring the link between uncertainty estimates obtained via "exact" Bayesian inference and out-of-distribution (OOD) detection.

Fast, accurate and reliable software for algebraic CT reconstruction

Code for Mining the Benefits of Two-stage and One-stage HOI Detection

The NEOSSat is a dual-mission microsatellite designed to detect potentially hazardous Earth-orbit-crossing asteroids and track objects that reside in deep space

toroidal - a lightweight transformer library for PyTorch

piSTAR Lab is a modular platform built to make AI experimentation accessible and fun. (pistar.ai)

This repository contains the map content ontology used in narrative cartography

中文语音识别系列，读者可以借助它快速训练属于自己的中文语音识别模型，或直接使用预训练模型测试效果。

image scene graph generation benchmark

The source code of CVPR17 'Generative Face Completion'.

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Sample Prior Guided Robust Model Learning to Suppress Noisy Labels