TDN: Temporal Difference Networks for Efficient Action Recognition

Overview

TDN: Temporal Difference Networks for Efficient Action Recognition

1

Overview

We release the PyTorch code of the TDN(Temporal Difference Networks). This code is based on the TSN and TSM codebase. The core code to implement the Temporal Difference Module are ops/base_module.py and ops/tdn_net.py.

🔥 [NEW!] We have released the PyTorch code of TDN.

Prerequisites

The code is built with following libraries:

Data Preparation

We have successfully trained TDN on Kinetics400, UCF101, HMDB51, Something-Something-V1 and V2 with this codebase.

  • The processing of Something-Something-V1 & V2 can be summarized into 3 steps:

    1. Extract frames from videos(you can use ffmpeg to get frames from video)
    2. Generate annotations needed for dataloader (" " in annotations) The annotation usually includes train.txt and val.txt. The format of *.txt file is like:
      frames/video_1 num_frames label_1
      frames/video_2 num_frames label_2
      frames/video_3 num_frames label_3
      ...
      frames/video_N num_frames label_N
      
    3. Add the information to ops/dataset_configs.py
  • The processing of Kinetics400 can be summarized into 2 steps:

    1. Generate annotations needed for dataloader (" " in annotations) The annotation usually includes train.txt and val.txt. The format of *.txt file is like:
      frames/video_1.mp4  label_1
      frames/video_2.mp4  label_2
      frames/video_3.mp4  label_3
      ...
      frames/video_N.mp4  label_N
      
    2. Add the information to ops/dataset_configs.py

Model Zoo

Here we provide some off-the-shelf pretrained models. The accuracy might vary a little bit compared to the paper, since the raw video of Kinetics downloaded by users may have some differences.

Something-Something-V1

Model Frames x Crops x Clips Top-1 Top-5 checkpoint
TDN-ResNet50 8x1x1 52.3% 80.6% link
TDN-ResNet50 16x1x1 53.9% 82.1% link

Something-Something-V2

Model Frames x Crops x Clips Top-1 Top-5 checkpoint
TDN-ResNet50 8x1x1 64.0% 88.8% link
TDN-ResNet50 16x1x1 65.3% 89.7% link

Kinetics400

Model Frames x Crops x Clips Top-1 (30 view) Top-5 (30 view) checkpoint
TDN-ResNet50 8x3x10 76.6% 92.8% link
TDN-ResNet50 16x3x10 77.5% 93.2% link
TDN-ResNet101 8x3x10 77.5% 93.6% link
TDN-ResNet101 16x3x10 78.5% 93.9% link

Testing

  • For center crop single clip, the processing of testing can be summarized into 2 steps:
    1. Run the following testing scripts:
      CUDA_VISIBLE_DEVICES=0 python3 test_models_center_crop.py something \
      --archs='resnet50' --weights   --test_segments=8  \
      --test_crops=1 --batch_size=16  --gpus 0 --output_dir  -j 4 --clip_index=1
      
    2. Run the following scripts to get result from the raw score:
      python3 pkl_to_results.py --num_clips 1 --test_crops 1 --output_dir   
      
  • For 3 crops, 10 clips, the processing of testing can be summarized into 2 steps:
    1. Run the following testing scripts for 10 times(clip_index from 0 to 9):
      CUDA_VISIBLE_DEVICES=0 python3 test_models_three_crops.py  kinetics \
      --archs='resnet50' --weights   --test_segments=8 \
      --test_crops=3 --batch_size=16 --full_res --gpus 0 --output_dir   \
      -j 4 --clip_index 
      
    2. Run the following scripts to ensemble the raw score of the 30 views:
      python pkl_to_results.py --num_clips 10 --test_crops 3 --output_dir  
      

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

  • For example, to train TDN-ResNet50 on Something-Something-V1 with 8 gpus, you can run:
    python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
                main.py  something  RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.02 \
                --lr_scheduler step --lr_steps  30 45 55 --epochs 60 --batch-size 16 \
                --wd 5e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb 
    
  • For example, to train TDN-ResNet50 on Kinetics400 with 8 gpus, you can run:
    python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
            main.py  kinetics RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.02 \
            --lr_scheduler step  --lr_steps 50 75 90 --epochs 100 --batch-size 16 \
            --wd 1e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb 
    

Acknowledgements

We especially thank the contributors of the TSN and TSM codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@article{wang2020tdn,
      title={TDN: Temporal Difference Networks for Efficient Action Recognition}, 
      author={Limin Wang and Zhan Tong and Bin Ji and Gangshan Wu},
      journal={arXiv preprint arXiv:2012.10071},
      year={2020}
}
Owner
Multimedia Computing Group, Nanjing University
Multimedia Computing Group, Nanjing University
領域を指定し、キーを入力することで画像を保存するツールです。クラス分類用のデータセット作成を想定しています。

image-capture-class-annotation 領域を指定し、キーを入力することで画像を保存するツールです。 クラス分類用のデータセット作成を想定しています。 Requirement OpenCV 3.4.2 or later Usage 実行方法は以下です。 起動後はマウスクリック4

KazuhitoTakahashi 5 May 28, 2021
Code for "On Memorization in Probabilistic Deep Generative Models"

On Memorization in Probabilistic Deep Generative Models This repository contains the code necessary to reproduce the experiments in On Memorization in

The Alan Turing Institute 3 Jun 09, 2022
[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

Yizhi Wang 17 Dec 22, 2022
Semi-SDP Semi-supervised parser for semantic dependency parsing.

Semi-SDP Semi-supervised parser for semantic dependency parsing. This repo contains the code used for the semi-supervised semantic dependency parser i

12 Sep 17, 2021
Official implementation of Rich Semantics Improve Few-Shot Learning (BMVC, 2021)

Rich Semantics Improve Few-Shot Learning Paper Link Abstract : Human learning benefits from multi-modal inputs that often appear as rich semantics (e.

Mohamed Afham 11 Jul 26, 2022
Disturbing Target Values for Neural Network regularization: attacking the loss layer to prevent overfitting

Disturbing Target Values for Neural Network regularization: attacking the loss layer to prevent overfitting 1. Classification Task PyTorch implementat

Yongho Kim 0 Apr 24, 2022
mPose3D, a mmWave-based 3D human pose estimation model.

mPose3D, a mmWave-based 3D human pose estimation model.

KylinChen 35 Nov 08, 2022
Resilience from Diversity: Population-based approach to harden models against adversarial attacks

Resilience from Diversity: Population-based approach to harden models against adversarial attacks Requirements To install requirements: pip install -r

0 Nov 23, 2021
[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

MIT Graphics Group 65 Jan 07, 2023
Repo 4 basic seminar §How to make human machine readable"

WORK IN PROGRESS... Notebooks from the Seminar: Human Machine Readable WS21/22 Introduction into programming Georg Trogemann, Christian Heck, Mattis

experimental-informatics 3 May 29, 2022
Easy to use Audio Tagging in PyTorch

Audio Classification, Tagging & Sound Event Detection in PyTorch Progress: Fine-tune on audio classification Fine-tune on audio tagging Fine-tune on s

sithu3 15 Dec 22, 2022
A implemetation of the LRCN in mxnet

A implemetation of the LRCN in mxnet ##Abstract LRCN is a combination of CNN and RNN ##Installation Download UCF101 dataset ./avi2jpg.sh to split the

44 Aug 25, 2022
An all-in-one application to visualize multiple different local path planning algorithms

Table of Contents Table of Contents Local Planner Visualization Project (LPVP) Features Installation/Usage Local Planners Probabilistic Roadmap (PRM)

Abdur Javaid 47 Dec 30, 2022
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble

datasketch: Big Data Looks Small datasketch gives you probabilistic data structures that can process and search very large amount of data super fast,

Eric Zhu 1.9k Jan 07, 2023
Metadata-Extractor - Metadata Extractor Script can be used to read in exif metadata

Metadata Extractor The exifextract script can be used to read in exif metadata f

1 Feb 16, 2022
All the code and files related to the MI-Lab of UE19CS305 course in sem 5

Machine-Intelligence-Lab-CS305 The compilation of all the code an drelated files from MI-Lab UE19CS305 (of batch 2019-2023) offered by PES University

Arvind Krishna 3 Nov 10, 2022
CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

M-BERT-Study CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY Motivation Multilingual BERT (M-BERT) has shown surprising cross lingual a

CogComp 1 Feb 28, 2022
Testability-Aware Low Power Controller Design with Evolutionary Learning, ITC2021

Testability-Aware Low Power Controller Design with Evolutionary Learning This repo contains the source code of Testability-Aware Low Power Controller

Lee Man 1 Dec 26, 2021
This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis, accepted at ACMMM 2021.

Ziqi Yuan 10 Sep 30, 2022
“Robust Lightweight Facial Expression Recognition Network with Label Distribution Training”, AAAI 2021.

EfficientFace Zengqun Zhao, Qingshan Liu, Feng Zhou. "Robust Lightweight Facial Expression Recognition Network with Label Distribution Training". AAAI

Zengqun Zhao 119 Jan 08, 2023