TVNet: Temporal Voting Network for Action Localization

Last update: Jul 26, 2022

Related tags

Overview

TVNet: Temporal Voting Network for Action Localization

This repo holds the codes of paper: "TVNet: Temporal Voting Network for Action Localization".

Paper Introduction

Temporal action localization is a vital task in video understranding. In this paper, we propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries.

Dependencies

Python == 2.7
Tensorflow == 1.9.0
CUDA==10.1.105
GCC >= 5.4

Note that the PEM code from BMN is implemented in Pytorch==1.1.0 or 1.3.0

Data Preparation

Datasets

Our experiments is based on ActivityNet 1.3 and THUMOS14 datasets.

Feature for THUMOS14

You can download the feature on THUMOS14 at here GooogleDrive.

Place it into a folder named thumos_features inside ./data.

You also need to download the feature for PEM (from BMN) at GooogleDrive. Please put it into a folder named Thumos_feature_hdf5 inside ./TVNet-THUMOS14/data/thumos_features.

If everything goes well, you can get the folder architecture of ./TVNet-THUMOS14/data like this:

data                       
└── thumos_features                    
		├── Thumos_feature_dim_400              
		├── Thumos_feature_hdf5               
		├── features_train.npy 
		└── features_test.npy

Feature for ActivityNet 1.3

You can download the feature on ActivityNet 1.3 at here GoogleCloud. Please put csv_mean_100 directory into ./TVNet-ANET/data/activitynet_feature_cuhk/.

If everything goes well, you can get the folder architecture of ./TVNet-ANET/data like this:

data                        
└── activitynet_feature_cuhk                    
		    └── csv_mean_100

Run all steps

Run all steps on THUMOS14

cd TVNet-THUMOS14

Run the following script with all steps on THUMOS14:

bash do_all.sh

Note: If you use BlueCrystal 4, you can directly run the following script without any dependencies setup.

bash do_all_BC4.sh

Run all steps on ActivityNet 1.3

cd TVNet-ANET
bash do_all.sh  or  bash do_all_BC4.sh

Run steps separately

Take TVNet-THUMOS14 as an example:

cd TVNet-THUMOS14

1. Temporal evaluation module

python TEM_train.py

python TEM_test.py

2. Creat training data for voting evidence module

python VEM_create_windows.py --window_length L --window_stride S

L is the window length and S is the sliding stride. We generate training windows for length 10 with stride 5, and length 5 with stride 2.

3. Voting evidence module

python VEM_train.py --voting_type TYPE --window_length L --window_stride S

python VEM_test.py --voting_type TYPE --window_length L --window_stride S

TYPE should be start or end. We train and test models with window length 10 (stride 5) and window length 5 (stride 2) for start and end separately.

4. Proposal evaluation module from BMN

python PEM_train.py

5. Proposal generation

python proposal_generation.py

6. Post processing and detection

python post_postprocess.py

Results

THUMOS14

tIoU	[email protected]
0.3	0.5724681814413137
0.4	0.5060844218403346
0.5	0.430414918823808
0.6	0.3297164845828022
0.7	0.202971546242546

ActivityNet 1.3

tIoU	[email protected]
Average	0.3460396513933088
0.5	0.5135151163296395
0.75	0.34955648726767025
0.95	0.10121803584836778

Reference

This implementation borrows from:

BSN: BSN-Boundary-Sensitive-Network

TEM_train/test.py -- for the TEM module we used in our paper
load_dataset.py -- borrow the part which load data for TEM

BMN: BMN-Boundary-Matching-Network

PEM_train.py -- for the PEM module we used in our paper

G-TAD: Sub-Graph Localization for Temporal Action Detection

post_postprocess.py -- for the multicore process to generate detection

Our main contribution is in:

VEM_create_windows.py -- generate training annotations for Voting Evidence Module (VEM)

VEM_train.py -- train Voting Evidence Module (VEM)

VEM_test.py -- test Voting Evidence Module (VEM)

TVNet: Temporal Voting Network for Action Localization

Related tags

Overview

TVNet: Temporal Voting Network for Action Localization

Paper Introduction

Dependencies

Data Preparation

Datasets

Feature for THUMOS14

Feature for ActivityNet 1.3

Run all steps

Run all steps on THUMOS14

Run all steps on ActivityNet 1.3

Run steps separately

1. Temporal evaluation module

2. Creat training data for voting evidence module

3. Voting evidence module

4. Proposal evaluation module from BMN

5. Proposal generation

6. Post processing and detection

Results

THUMOS14

ActivityNet 1.3

Reference

Owner

hywang

Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

This is a Tensorflow implementation of Learning to See in the Dark in CVPR 2018

For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

Whisper is a file-based time-series database format for Graphite.

A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

3D ResNets for Action Recognition (CVPR 2018)

EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation

GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery

A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

Code for "Diversity can be Transferred: Output Diversification for White- and Black-box Attacks"

"Domain Adaptive Semantic Segmentation without Source Data" (ACM MM 2021)

Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy

Introducing neural networks to predict stock prices

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Implementation for "Exploiting Aliasing for Manga Restoration" (CVPR 2021)