[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Overview

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Created by Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie Zhou, Jiwen Lu

This repository contains PyTorch implementation for Bridge-Prompt (CVPR 2022).

We propose a prompt-based framework, Bridge-Prompt (Br-Prompt), to model the semantics across multiple adjacent correlated actions, so that it simultaneously exploits both out-of-context and contextual information from a series of ordinal actions in instructional videos. More specifically, we reformulate the individual action labels as integrated text prompts for supervision, which bridge the gap between individual action semantics. The generated text prompts are paired with corresponding video clips, and together co-train the text encoder and the video encoder via a contrastive approach. The learned vision encoder has a stronger capability for ordinal-action-related downstream tasks, e.g. action segmentation and human activity recognition.

intro

Our code is based on CLIP and ActionCLIP.

Prerequisites

Requirements

You may need ffmpeg for video data pre-processing.

The environment is also recorded in requirements.txt, which can be reproduced by

pip install -r requirements.txt

Pretrained models

We use the base model (ViT-B/16 for image encoder & text encoder) pre-trained by ActionCLIP based on Kinetics-400. The model can be downloaded in link (pwd:ilgw). The pre-trained model should be saved in ./models/.

Datasets

Raw video files are needed to train our framework. Please download the datasets with RGB videos from the official websites ( Breakfast / GTEA / 50Salads ) and save them under the folder ./data/(name_dataset). For convenience, we have used the extracted frames of the raw RGB videos as inputs. You can extract the frames from raw RGB datasets by running:

python preprocess/get_frames.py --dataset (name_dataset) --vpath (folder_to_your_videos) --fpath ./data/(name_dataset)/frames/

To be noticed, ffmpeg is needed here for frame extraction.

Furthermore, please also extract the .zip files to ./data/(name_dataset) respectively.

Training

  • To train Bridge-Prompt on Breakfast from Kinetics400 pretrained models, you can run:
bash scripts/run_train.sh  ./configs/breakfast/breakfast_ft.yaml
  • To train Bridge-Prompt on GTEA from Kinetics400 pretrained models, you can run:
bash scripts/run_train.sh  ./configs/gtea/gtea_ft.yaml
  • To train Bridge-Prompt on 50Salads from Kinetics400 pretrained models, you can run:
bash scripts/run_train.sh  ./configs/salads/salads_ft.yaml

Extracting frame features

We use the Bridge-Prompt pre-trained image encoders to extract frame-wise features for further downstream tasks (e.g. action segmentation). You can run the following command for each dataset respectively:

python extract_frame_features.py --config ./configs/(dataset_name)/(dataset_name)_exfm.yaml --dataset (dataset_name)

Since 50Salads/Breakfast are large scale datasets, we extract the frame features by window splits. To combine the splits, please run the following command:

python preprocess/combine_features.py

Please modify the variables dataset and feat_name in combine_features.py for each dataset.

Action segmentation

You can reproduce the action segmentation results using ASFormer by the previously extracted frame features.

Activity recognition

You can reproduce the activity recognition results using the command:

python ft_acti.py

based on the previously extracted frame features (Breakfast).

Ordinal action recognition

The ordinal action inferences are executed using the command:

bash scripts/run_test.sh  ./configs/(dataset_name)/(dataset_name)_test.yaml

and check the accuracies using:

bash preprocess/checknpy.py

Please modify the variables dataset in checknpy.py for each dataset.

Notes

Please modify pretrain in all config files according to your own working directions.

License

MIT License.

Owner
Graduate student of Tsinghua University. Major in Automation.
Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

PyGAS: Auto-Scaling GNNs in PyG PyGAS is the practical realization of our G NN A uto S cale (GAS) framework, which scales arbitrary message-passing GN

Matthias Fey 139 Dec 25, 2022
Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Pytorch Recurrent Variational Autoencoder Model: This is the implementation of Samuel Bowman's Generating Sentences from a Continuous Space with Kim's

Daniil Gavrilov 347 Nov 14, 2022
This repository is the code of the paper "Sparse Spatial Transformers for Few-Shot Learning".

🌟 Sparse Spatial Transformers for Few-Shot Learning This code implements the Sparse Spatial Transformers for Few-Shot Learning(SSFormers). Our code i

chx_nju 38 Dec 13, 2022
Code for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning Pytorch Implementation for DisCo: Remedy Self-supervi

79 Jan 06, 2023
《Truly shift-invariant convolutional neural networks》(2021)

Truly shift-invariant convolutional neural networks [Paper] Authors: Anadi Chaman and Ivan Dokmanić Convolutional neural networks were always assumed

Anadi Chaman 46 Dec 19, 2022
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs ArXiv Abstract Convolutional Neural Networks (CNNs) have become the de f

Philipp Benz 12 Oct 24, 2022
Nest - A flexible tool for building and sharing deep learning modules

Nest - A flexible tool for building and sharing deep learning modules Nest is a flexible deep learning module manager, which aims at encouraging code

ZhouYanzhao 41 Oct 10, 2022
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

The Hypersim Dataset For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real i

Apple 1.3k Jan 04, 2023
A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

Hong-Jia Chen 127 Dec 23, 2022
LSTM and QRNN Language Model Toolkit for PyTorch

LSTM and QRNN Language Model Toolkit This repository contains the code used for two Salesforce Research papers: Regularizing and Optimizing LSTM Langu

Salesforce 1.9k Jan 08, 2023
Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

ADE20k Semantic segmentation with MAE Getting started Install the mmsegmentation

97 Dec 17, 2022
🌊 Online machine learning in Python

In a nutshell River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition

OnlineML 4k Jan 02, 2023
Source code for Transformer-based Multi-task Learning for Disaster Tweet Categorisation (UCD's participation in TREC-IS 2020A, 2020B and 2021A).

Source code for "UCD participation in TREC-IS 2020A, 2020B and 2021A". *** update at: 2021/05/25 This repo so far relates to the following work: Trans

Congcong Wang 4 Oct 19, 2021
Official code for "Decoupling Zero-Shot Semantic Segmentation"

Decoupling Zero-Shot Semantic Segmentation This is the official code for the arxiv. ZegFormer is the first framework that decouple the zero-shot seman

Jian Ding 108 Dec 30, 2022
Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning This is the official repository for Conservative and Adaptive Penalty fo

7 Nov 22, 2022
Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

What is judgyprophet? judgyprophet is a Bayesian forecasting algorithm based on Prophet, that enables forecasting while using information known by the

AstraZeneca 56 Oct 26, 2022
Structured Data Gradient Pruning (SDGP)

Structured Data Gradient Pruning (SDGP) Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by re

Bradley McDanel 10 Nov 11, 2022
An Implementation of SiameseRPN with Feature Pyramid Networks

SiameseRPN with FPN This project is mainly based on HelloRicky123/Siamese-RPN. What I've done is just add a Feature Pyramid Network method to the orig

3 Apr 16, 2022
A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.

Telemanom (v2.0) v2.0 updates: Vectorized operations via numpy Object-oriented restructure, improved organization Merge branches into single branch fo

Kyle Hundman 844 Dec 28, 2022
A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

Pawel Dziemiach 1 Dec 18, 2021