Official PyTorch Implementation for "Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes"

Overview

PVDNet: Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes

License CC BY-NC

This repository contains the official PyTorch implementation of the following paper:

Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes
Hyeongseok Son, Junyong Lee, Jonghyeop Lee, Sunghyun Cho, Seungyong Lee, TOG 2021 (presented at SIGGRAPH 2021)

About the Research

Click here

Overall Framework

Our video deblurring framework consists of three modules: a blur-invariant motion estimation network (BIMNet), a pixel volume generator, and a pixel volume-based deblurring network (PVDNet). We first train BIMNet; after it has converged, we combine the two networks with the pixel volume generator. We then fix the parameters of BIMNet and train PVDNet by training the entire network.

Blur-Invariant Motion Estimation Network (BIMNet)

To estimate motion between frames accurately, we adopt LiteFlowNet and train it with a blur-invariant loss so that the trained network can estimate blur-invariant optical flow between frames. We train BIMNet with a blur-invariant loss , which is defined as (refer Eq. 1 in the main paper):

The figure shows a qualitative comparison of different optical flow methods. The results of the other methods contain severely distorted structures due to errors in their optical flow maps. In contrast, the results of BIMNets show much less distortions.

Pixel Volume for Motion Compensation

We propose a novel pixel volume that provides multiple candidates for matching pixels between images. Moreover, a pixel volume provides an additional cue for motion compensation based on the majority.

Our pixel volume approach leads to the performance improvement of video deblurring by utilizing the multiple candidates in a pixel volume in two aspects: 1) in most cases, the majority cue for the correct match would help as the statistics (Sec. 4.4 in the main paper) shows, and 2) in other cases, PVDNet would exploit multiple candidates to estimate the correct match referring to nearby pixels with majority cues.

Getting Started

Prerequisites

Tested environment

Ubuntu18.04 Python 3.8.8 PyTorch 1.8.0 CUDA 10.2

  1. Environment setup

    $ git clone https://github.com/codeslake/PVDNet.git
    $ cd PVDNet
    
    $ conda create -y --name PVDNet python=3.8 && conda activate PVDNet
    # for CUDA10.2
    $ sh install_CUDA10.2.sh
    # for CUDA11.1
    $ sh install_CUDA11.1.sh
  2. Datasets

    • Download and unzip Su et al.'s dataset and Nah et al.'s dataset under [DATASET_ROOT]:

      ├── [DATASET_ROOT]
      │   ├── train_DVD
      │   ├── test_DVD
      │   ├── train_nah
      │   ├── test_nah
      

      Note:

      • [DATASET_ROOT] is currently set to ./datasets/video_deblur. It can be specified by modifying config.data_offset in ./configs/config.py.
  3. Pre-trained models

    • Download and unzip pretrained weights under ./ckpt/:

      ├── ./ckpt
      │   ├── BIMNet.pytorch
      │   ├── PVDNet_DVD.pytorch
      │   ├── PVDNet_nah.pytorch
      │   ├── PVDNet_large_nah.pytorch
      

Testing models of TOG2021

For PSNRs and SSIMs reported in the paper, we use the approach of Koehler et al. following Su et al., that first aligns two images using global translation to represent the ambiguity in the pixel location caused by blur.
Refer here for the evaluation code.

## Table 4 in the main paper (Evaluation on Su etal's dataset)
# Our final model 
CUDA_VISIBLE_DEVICES=0 python run.py --mode PVDNet_DVD --config config_PVDNet --data DVD --ckpt_abs_name ckpt/PVDNet_DVD.pytorch

## Table 5 in the main paper (Evaluation on Nah etal's dataset)
# Our final model 
CUDA_VISIBLE_DEVICES=0 python run.py --mode PVDNet_nah --config config_PVDNet --data nah --ckpt_abs_name ckpt/PVDNet_nah.pytorch

# Larger model
CUDA_VISIBLE_DEVICES=0 python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data nah --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch

Note:

  • Testing results will be saved in [LOG_ROOT]/PVDNet_TOG2021/[mode]/result/quanti_quali/[mode]_[epoch]/[data]/.
  • [LOG_ROOT] is set to ./logs/ by default. Refer here for more details about the logging.
  • options
    • --data: The name of a dataset to evaluate: DVD | nah | random. Default: DVD
      • The data structure can be modified in the function set_eval_path(..) in ./configs/config.py.
      • random is for testing models with any video frames, which should be placed as [DATASET_ROOT]/random/[video_name]/*.[jpg|png].

Wiki

Citation

If you find this code useful, please consider citing:

@artical{Son_2021_TOG,
    author = {Son, Hyeongseok and Lee, Junyong and Lee, Jonghyeop and Cho, Sunghyun and Lee, Seungyong},
    title = {Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes},
    journal = {ACM Transactions on Graphics},
    year = {2021}
}

Contact

Open an issue for any inquiries. You may also have contact with [email protected] or [email protected]

Resources

All material related to our paper is available by following links:

Link
The main paper
arXiv
Supplementary Files
Checkpoint Files
Su et al [2017]'s dataset (reference)
Nah et al. [2017]'s dataset (reference)

License

This software is being made available under the terms in the LICENSE file.

Any exemptions to these terms require a license from the Pohang University of Science and Technology.

About Coupe Project

Project ‘COUPE’ aims to develop software that evaluates and improves the quality of images and videos based on big visual data. To achieve the goal, we extract sharpness, color, composition features from images and develop technologies for restoring and improving by using them. In addition, personalization technology through user reference analysis is under study.

Please check out other Coupe repositories in our Posgraph github organization.

Useful Links

Owner
Junyong Lee
Ph.D candidate at POSTECH
Junyong Lee
Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks

pix2vox [Demonstration video] Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks. Generated samples Single-category generation M

Takumi Moriya 232 Nov 14, 2022
U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."

Xuebin Qin 6.5k Jan 09, 2023
Pathdreamer: A World Model for Indoor Navigation

Pathdreamer: A World Model for Indoor Navigation This repository hosts the open source code for Pathdreamer, to be presented at ICCV 2021. Paper | Pro

Google Research 122 Jan 04, 2023
Neural network for recognizing the gender of people in photos

Neural Network For Gender Recognition How to test it? Install requirements.txt file using pip install -r requirements.txt command Run nn.py using pyth

Valery Chapman 1 Sep 18, 2022
Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

PlantDoc: A Dataset for Visual Plant Disease Detection This repository contains the Cropped-PlantDoc dataset used for benchmarking classification mode

Pratik Kayal 109 Dec 29, 2022
Codes for the compilation and visualization examples to the HIF vegetation dataset

High-impedance vegetation fault dataset This repository contains the codes that compile the "Vegetation Conduction Ignition Test Report" data, which a

1 Dec 12, 2021
Collection of Docker images for ML/DL and video processing projects

Collection of Docker images for ML/DL and video processing projects. Overview of images Three types of images differ by tag postfix: base: Python with

OSAI 87 Nov 22, 2022
Self-Supervised Speech Pre-training and Representation Learning Toolkit.

What's New Sep 2021: We host a challenge in AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing! See SUPERB official site

s3prl 1.6k Jan 08, 2023
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

880 Jan 07, 2023
Boundary-aware Transformers for Skin Lesion Segmentation

Boundary-aware Transformers for Skin Lesion Segmentation Introduction This is an official release of the paper Boundary-aware Transformers for Skin Le

Jiacheng Wang 79 Dec 16, 2022
Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem

Benchmarking nearest neighbors Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem, but so far t

Erik Bernhardsson 3.2k Jan 03, 2023
Adapter-BERT: Parameter-Efficient Transfer Learning for NLP.

Adapter-BERT: Parameter-Efficient Transfer Learning for NLP.

Google Research 340 Jan 03, 2023
Object detection and instance segmentation toolkit based on PaddlePaddle.

Object detection and instance segmentation toolkit based on PaddlePaddle.

9.3k Jan 02, 2023
Face Alignment using python

Face Alignment Face Alignment using python Input Image Aligned Face Aligned Face Aligned Face Input Image Aligned Face Input Image Aligned Face Instal

Sajjad Aemmi 28 Nov 23, 2022
SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Wentao Zhu 24 May 20, 2022
A PyTorch-Based Framework for Deep Learning in Computer Vision

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{you2019torchcv, author = {Ansheng You and Xiangtai Li and Zhen Zhu a

Donny You 2.2k Jan 09, 2023
This is a re-implementation of TransGAN: Two Pure Transformers Can Make One Strong GAN (CVPR 2021) in PyTorch.

TransGAN: Two Transformers Can Make One Strong GAN [YouTube Video] Paper Authors: Yifan Jiang, Shiyu Chang, Zhangyang Wang CVPR 2021 This is re-implem

Ahmet Sarigun 79 Jan 05, 2023
Adversarial Learning for Semi-supervised Semantic Segmentation, BMVC 2018

Adversarial Learning for Semi-supervised Semantic Segmentation This repo is the pytorch implementation of the following paper: Adversarial Learning fo

Wayne Hung 464 Dec 19, 2022
Code of TIP2021 Paper《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet and Pytorch versions.

SFace Code of TIP2021 Paper 《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet, PyTorch and Jittor versi

Zhong Yaoyao 47 Nov 25, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 02, 2023