PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Last update: Dec 16, 2022

Related tags

Deep Learning R2Plus1D-PyTorch

Overview

R2Plus1D-PyTorch

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Link to original: paper and code

NOTE: This repository has been archived, although forks and other work that extend on top of this remain welcome

Requirements

R2Plus1D-PyTorch has the following requirements

PyTorch 0.4 and dependencies
OpenCV (tested on 3.4.0.12)
tqdm (for progress bars)

About this repository

This repository consists of four python files:

module.py - Contains an implementation of the factored, R2Plus1D convolution the entire implementation is based around. It is designed to be a replacement for nn.Conv3D in the appropriate scenario
network.py - Uses module.py to build up the residual network described in the paper
dataset.py - Implements a PyTorch dataset, that can load videos with appropriate labels from a given directory.
trainer.py - A mildly modified version of the script from the PyTorch tutorials to train the model. Features saving and restoring capabilities.

Training on Kinetics-400/600

This repository does not include a crawler or downloader for the Kinetics-400/600 dataset, however, one can be found here. It is strongly recommended to downsample the videos prior to training (and not on the fly), using a tool such as ffmpeg. If using the crawler, this can be done by adding "-vf", "scale=172:128" to the ffmpeg command list in the download clip function.

Training in general

This repository is designed for the ResNet to be trained on any dataset of videos in general, using the VideoDataloader class from dataset.py . It expects the videos to be arranged in a directory -> [train/val] folders -> [class_label] folders (one for each class) -> videos (the files themselves).

Forks and fixes of this repo are highly welcome!

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Related tags

Overview

R2Plus1D-PyTorch

Requirements

About this repository

Training on Kinetics-400/600

Training in general

Owner

Irhum Shafkat

OpenDILab Multi-Agent Environment

Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

TensorFlow Metal Backend on Apple Silicon Experiments (just for fun)

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

Deep and online learning with spiking neural networks in Python

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

👐OpenHands : Making Sign Language Recognition Accessible (WiP 🚧👷‍♂️🏗)

A Temporal Extension Library for PyTorch Geometric

This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Deduplicating Training Data Makes Language Models Better

A machine learning project which can detect and predict the skin disease through image recognition.

coldcuts is an R package to automatically generate and plot segmentation drawings in R

PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

Simple data balancing baselines for worst-group-accuracy benchmarks.

PyTorch implementation of our paper: Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

Build Graph Nets in Tensorflow