Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

Overview

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

This is an official implementation in PyTorch of AFSD. Our paper is available at https://arxiv.org/abs/2103.13137

Updates

  • (May, 2021) We released AFSD training and inference code for THUMOS14 dataset.
  • (February, 2021) AFSD is accepted by CVPR2021.

Abstract

Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start and end frame for each action instance in a long, untrimmed video. While most current models achieve good results by using pre-defined anchors and numerous actionness, such methods could be bothered with both large number of outputs and heavy tuning of locations and sizes corresponding to different anchors. Instead, anchor-free methods is lighter, getting rid of redundant hyper-parameters, but gains few attention. In this paper, we propose the first purely anchor-free temporal localization method, which is both efficient and effective. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module to gather more valuable boundary features for each proposal with a novel boundary pooling, and (iii) several consistency constraints to make sure our model can find the accurate boundary given arbitrary proposals. Extensive experiments show that our method beats all anchor-based and actionness-guided methods with a remarkable margin on THUMOS14, achieving state-of-the-art results, and comparable ones on ActivityNet v1.3.

Summary

  • First purely anchor-free framework for temporal action detection task.
  • Fully end-to-end method using frames as input rather then features.
  • Saliency-based refinement module to gather more valuable boundary features.
  • Boundary consistency learning to make sure our model can find the accurate boundary.

Performance

Getting Started

Environment

  • Python 3.7
  • PyTorch == 1.4.0 (Please make sure your pytorch version is 1.4)
  • NVIDIA GPU

Setup

pip3 install -r requirements.txt
python3 setup.py develop

Data Preparation

  • THUMOS14 RGB data:
  1. Download post-processed RGB npy data (13.7GB): [Weiyun]
  2. Unzip the RGB npy data to ./datasets/thumos14/validation_npy/ and ./datasets/thumos14/test_npy/
  • THUMOS14 flow data:
  1. Because it costs more time to generate flow data for THUMOS14, to make easy to run flow model, we provide the post-processed flow data in Google Drive and Weiyun (3.4GB): [Google Drive], [Weiyun]
  2. Unzip the flow npy data to ./datasets/thumos14/validation_flow_npy/ and ./datasets/thumos14/test_flow_npy/

If you want to generate npy data by yourself, please refer to the following guidelines:

  • RGB data generation manually:
  1. To construct THUMOS14 RGB npy inputs, please download the THUMOS14 training and testing videos.
    Training videos: https://storage.googleapis.com/thumos14_files/TH14_validation_set_mp4.zip
    Testing videos: https://storage.googleapis.com/thumos14_files/TH14_Test_set_mp4.zip
    (unzip password is THUMOS14_REGISTERED)
  2. Move the training videos to ./datasets/thumos14/validation/ and the testing videos to ./datasets/thumos14/test/
  3. Run the data processing script: python3 AFSD/common/video2npy.py
  • Flow data generation manually:
  1. If you should generate flow data manually, firstly install the denseflow.
  2. Prepare the post-processed RGB data.
  3. Check and run the script: python3 AFSD/common/gen_denseflow_npy.py

Inference

We provide the pretrained models contain I3D backbone model and final RGB and flow models for THUMOS14 dataset: [Google Drive], [Weiyun]

# run RGB model
python3 AFSD/thumos14/test.py configs/thumos14.yaml --checkpoint_path=models/thumos14/checkpoint-15.ckpt --output_json=thumos14_rgb.json

# run flow model
python3 AFSD/thumos14/test.py configs/thumos14_flow.yaml --checkpoint_path=models/thumos14_flow/checkpoint-16.ckpt --output_json=thumos14_flow.json

# run fusion (RGB + flow) model
python3 AFSD/thumos14/test.py configs/thumos14.yaml --fusion --output_json=thumos14_fusion.json

Evaluation

The output json results of pretrained model can be downloaded from: [Google Drive], [Weiyun]

# evaluate THUMOS14 fusion result as example
python3 eval.py output/thumos14_fusion.json

mAP at tIoU 0.3 is 0.6728296149479254
mAP at tIoU 0.4 is 0.6242590551201842
mAP at tIoU 0.5 is 0.5546668739091394
mAP at tIoU 0.6 is 0.4374840824921885
mAP at tIoU 0.7 is 0.3110112542745055

Training

# train the RGB model
python3 AFSD/thumos14/train.py configs/thumos14.yaml --lw=10 --cw=1 --piou=0.5

# train the flow model
python3 AFSD/thumos14/train.py configs/thumos14_flow.yaml --lw=10 --cw=1 --piou=0.5

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{lin2021afsd,
  title={Learning Salient Boundary Feature for Anchor-free Temporal Action Localization},
  author={Chuming Lin*, Chengming Xu*, Donghao Luo, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Yanwei Fu},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}
Owner
Tencent YouTu Research
Tencent YouTu Research
A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Biomedical Computer Vision @ Uniandes 52 Dec 19, 2022
SNE-RoadSeg in PyTorch, ECCV 2020

SNE-RoadSeg Introduction This is the official PyTorch implementation of SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentati

242 Dec 20, 2022
Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Official repository of the paper (UAI 2021) "A Variational Approximation for Analyzing the Dynamics of Panel Data", Mixed Effect Neural ODE. Panel dat

Jurijs Nazarovs 7 Nov 26, 2022
Variational autoencoder for anime face reconstruction

VAE animeface Variational autoencoder for anime face reconstruction Introduction This repository is an exploratory example to train a variational auto

Minzhe Zhang 2 Dec 11, 2021
Code for our SIGCOMM'21 paper "Network Planning with Deep Reinforcement Learning".

0. Introduction This repository contains the source code for our SIGCOMM'21 paper "Network Planning with Deep Reinforcement Learning". Notes The netwo

NetX Group 68 Nov 24, 2022
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

AST: Audio Spectrogram Transformer Introduction Citing Getting Started ESC-50 Recipe Speechcommands Recipe AudioSet Recipe Pretrained Models Contact I

Yuan Gong 603 Jan 07, 2023
deep learning model that learns to code with drawing in the Processing language

sketchnet sketchnet - processing code generator can we teach a computer to draw pictures with code. We use Processing and java/jruby code paired with

41 Dec 12, 2022
Automatic Image Background Subtraction

Automatic Image Background Subtraction This repo contains set of scripts for automatic one-shot image background subtraction task using the following

Oleg Sémery 6 Dec 05, 2022
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

Ajinkya Kulkarni 43 Nov 27, 2022
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

220 Dec 31, 2022
Cross-platform CLI tool to generate your Github profile's stats and summary.

ghs Cross-platform CLI tool to generate your Github profile's stats and summary. Preview Hop on to examples for other usecases. Jump to: Installation

HackerRank 134 Dec 20, 2022
Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

ASFormer: Transformer for Action Segmentation This repo provides training & inference code for BMVC 2021 paper: ASFormer: Transformer for Action Segme

42 Dec 23, 2022
Experiments with the Robust Binary Interval Search (RBIS) algorithm, a Query-Based prediction algorithm for the Online Search problem.

OnlineSearchRBIS Online Search with Best-Price and Query-Based Predictions This is the implementation of the Robust Binary Interval Search (RBIS) algo

S. K. 1 Apr 16, 2022
Computer-Vision-Paper-Reviews - Computer Vision Paper Reviews with Key Summary along Papers & Codes

Computer-Vision-Paper-Reviews Computer Vision Paper Reviews with Key Summary along Papers & Codes. Jonathan Choi 2021 50+ Papers across Computer Visio

Jonathan Choi 2 Mar 17, 2022
A GOOD REPRESENTATION DETECTS NOISY LABELS

A GOOD REPRESENTATION DETECTS NOISY LABELS This code is a PyTorch implementation of the paper: Prerequisites Python 3.6.9 PyTorch 1.7.1 Torchvision 0.

<a href=[email protected]"> 64 Jan 04, 2023
The spiritual successor to knockknock for PyTorch Lightning, get notified when your training ends

Who's there? The spiritual successor to knockknock for PyTorch Lightning, to get a notification when your training is complete or when it crashes duri

twsl 70 Oct 06, 2022
A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

Clark He 49 Sep 20, 2022
Official repo for our 3DV 2021 paper "Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements".

Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy Paper. Pr

Yu Rong 41 Dec 13, 2022
Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

meshtalk This repository contains code to run MeshTalk for face animation from audio. If you use MeshTalk, please cite @inproceedings{richard2021mesht

Meta Research 221 Jan 06, 2023
Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Bidirectional Projection Network for Cross Dimension Scene Understanding CVPR 2021 (Oral) [ Project Webpage ] [ arXiv ] [ Video ] Existing segmentatio

Hu Wenbo 135 Dec 26, 2022