Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

Related tags

Deep LearningCDFI
Overview

CDFI (Compression-Driven-Frame-Interpolation)

[Paper] (Coming soon...) | [arXiv]

Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Introduction

We propose a Compression-Driven network design for Frame Interpolation (CDFI), that leverages model compression to significantly reduce the model size (allows a better understanding of the current architecture) while making room for further improvements and achieving superior performance in the end. Concretely, we first compress AdaCoF and show that a 10X compressed AdaCoF performs similarly as its original counterpart; then we improve upon this compressed model with simple modifications. Note that typically it is prohibitive to implement the same improvements on the original heavy model.

  • We achieve a significant performance gain with only a quarter in size compared with the original AdaCoF

    Vimeo-90K Middlebury UCF101-DVF #Params
    PSNR, SSIM, LPIPS PSNR, SSIM, LPIPS PSNR, SSIM, LPIPS
    AdaCoF 34.38, 0.974, 0.019 35.74, 0.979, 0.019 35.20, 0.967, 0.019 21.8M
    Compressed AdaCoF 34.15, 0.973, 0.020 35.46, 0.978, 0.019 35.14, 0.967, 0.019 2.45M
    AdaCoF+ 34.58, 0.975, 0.018 36.12, 0.981, 0.017 35.19, 0.967, 0.019 22.9M
    Compressed AdaCoF+ 34.46, 0.975, 0.019 35.76, 0.979, 0.019 35.16, 0.967, 0.019 2.56M
    Our Final Model 35.19, 0.978, 0.010 37.17, 0.983, 0.008 35.24, 0.967, 0.015 4.98M
  • Our final model also performs favorably against other state-of-the-arts (details refer to our paper)

  • The proposed framework is generic and can be easily transferred to other DNN-based frame interpolation method

The above GIF is a demo of using our method to generate slow motion video, which increases the FPS from 5 to 160. We also provide a long video demonstration here (redirect to YouTube).

Environment

  • CUDA 11.0

  • python 3.8.3

  • torch 1.6.0

  • torchvision 0.7.0

  • cupy 7.7.0

  • scipy 1.5.2

  • numpy 1.19.1

  • Pillow 7.2.0

  • scikit-image 0.17.2

Test Pre-trained Models

Download repository:

$ git clone https://github.com/tding1/CDFI.git
$ cd CDFI/

Testing data

For user convenience, we already provide the Middlebury and UCF101-DVF test datasets in our repository, which can be found under directory test_data/.

Evaluation metrics

We use the built-in functions in skimage.metrics to compute the PSNR and SSIM, for which the higher the better. We also use LPIPS, a newly proposed metric that measures perceptual similarity, for which the smaller the better. For user convenience, we include the implementation of LPIPS in our repo under lpips_pytorch/, which is a slightly modified version of here (with an updated squeezenet backbone).

Test our pre-trained CDFI model

$ python test.py --gpu_id 0

By default, it will load our pre-trained model checkpoints/CDFI_adacof.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/cdfi_adacof/.

Test the compressed AdaCoF

$ python test_compressed_adacof.py --gpu_id 0 --kernel_size 5 --dilation 1

By default, it will load the compressed AdaCoF model checkpoints/compressed_adacof_F_5_D_1.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/compressed_adacof_F_5_D_1/.

Test the compressed AdaCoF+

$ python test_compressed_adacof.py --gpu_id 0 --kernel_size 11 --dilation 2

By default, it will load the compressed AdaCoF+ model checkpoints/compressed_adacof_F_11_D_2.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/compressed_adacof_F_11_D_2/.

Interpolate two frames

$ python interpolate_twoframe.py --gpu_id 0 --first_frame figs/0.png --second_frame figs/1.png --output_frame output.png

By default, it will load our pre-trained model checkpoints/CDFI_adacof.pth, and generate the intermediate frame output.png given two consecutive frames in a sequence.

Train Our Model

Training data

We use the Vimeo-90K triplet dataset for video frame interpolation task, which is relatively large (>32 GB).

$ wget http://data.csail.mit.edu/tofu/dataset/vimeo_triplet.zip
$ unzip vimeo_triplet.zip
$ rm vimeo_triplet.zip

Start training

$ python train.py --gpu_id 0 --data_dir path/to/vimeo_triplet/ --batch_size 8

It will generate an unique ID for each training, and all the intermediate results/records will be saved under model_weights/<training id>/. For a GPU device with memory around 10GB, the --batch_size can take a value as large as 3, otherwise CUDA may be out of memory. There are many other training options, e.g., --lr, --epochs, --loss and so on, can be found in train.py.

Apply CDFI to New Models

One nice thing about CDFI is that the framework can be easily applied to other (heavy) DNN models and potentially boost their performance. The key to CDFI is the optimization-based compression that compresses a model via fine-grained pruning. In particular, we use the efficient and easy-to-use sparsity-inducing optimizer OBPROXSG (see also paper), and summarize the compression procedure for any other model in the following.

  • Copy the OBPROXSG optimizer, which is already implemented as torch.optim.optimizer, to your working directory
  • Starting from a pre-trained model, finetune its weights by using the OBPROXSG optimizer, like using any standard PyTorch built-in optimizer such as SGD or Adam
    • It is not necessarily to use the full dataset for this finetuning process
  • The parameters for the OBPROXSG optimizer
    • lr: learning rate
    • lambda_: coefficient of the L1 regularization term
    • epochSize: number of batches in a epoch
    • Np: number of proximal steps, which is set to be 2 for pruning AdaCoF
    • No: number of orthant steps (key step to promote sparsity), for which we recommend using the default setting
    • eps: threshold for trimming zeros, which is set to be 0.0001 for pruning AdaCoF
  • After the optimization is done (either by reaching a maximum number of epochs or achieving a high sparsity), use the layer density as the compression ratio for that layer, as described in the paper
  • As an example, compare the architectures in models/adacof.py and model/compressed_adacof.py for compressing AdaCoF with the above procedure

Now it's ready to make further improvements/modifications on the compressed model, based on the understanding of its flaws/drawbacks.

Citation

Coming soon...

Acknowledgements

The code is largely based on HyeongminLEE/AdaCoF-pytorch and baowenbo/DAIN.

Owner
Tianyu Ding
Ph.D. in Applied Mathematics \\ Master in Computer Science
Tianyu Ding
RADIal is available now! Check the download section

Latest news: RADIal is available now! Check the download section. However, because we are currently working on the data anonymization, we provide for

valeo.ai 55 Jan 03, 2023
This app is a simple example of using Strealit to create a financial data web app.

Streamlit Demo: Finance Chart This app is a simple example of using Streamlit to create a financial data web app. This demo use streamlit, pandas and

91 Jan 02, 2023
[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

Xiefan Guo 122 Dec 11, 2022
Deep Networks with Recurrent Layer Aggregation

RLA-Net: Recurrent Layer Aggregation Recurrence along Depth: Deep Networks with Recurrent Layer Aggregation This is an implementation of RLA-Net (acce

Joy Fang 21 Aug 16, 2022
The code for replicating the experiments from the LFI in SSMs with Unknown Dynamics paper.

Likelihood-Free Inference in State-Space Models with Unknown Dynamics This package contains the codes required to run the experiments in the paper. Th

Alex Aushev 0 Dec 27, 2021
PyTorch code to run synthetic experiments.

Code repository for Invariant Risk Minimization Source code for the paper: @article{InvariantRiskMinimization, title={Invariant Risk Minimization}

Facebook Research 345 Dec 12, 2022
Let's Git - Versionsverwaltung & Open Source Hausaufgabe

Let's Git - Versionsverwaltung & Open Source Hausaufgabe Herzlich Willkommen zu dieser Hausaufgabe für unseren MOOC: Let's Git! Wir hoffen, dass Du vi

1 Dec 13, 2021
Official implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform", ICCV 2021

Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform This repository is the implementation of "Variable-Rate Deep Image C

Myungseo Song 47 Dec 13, 2022
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 687 Jan 07, 2023
Transformers are Graph Neural Networks!

🚀 Gated Graph Transformers Gated Graph Transformers for graph-level property prediction, i.e. graph classification and regression. Associated article

Chaitanya Joshi 46 Jun 30, 2022
Permute Me Softly: Learning Soft Permutations for Graph Representations

Permute Me Softly: Learning Soft Permutations for Graph Representations

Giannis Nikolentzos 7 Jul 10, 2022
Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

Julius Kunze 26 Oct 05, 2022
Learning Features with Parameter-Free Layers (ICLR 2022)

Learning Features with Parameter-Free Layers (ICLR 2022) Dongyoon Han, YoungJoon Yoo, Beomyoung Kim, Byeongho Heo | Paper NAVER AI Lab, NAVER CLOVA Up

NAVER AI 65 Dec 07, 2022
This repository compare a selfie with images from identity documents and response if the selfie match.

aws-rekognition-facecompare This repository compare a selfie with images from identity documents and response if the selfie match. This code was made

1 Jan 27, 2022
a dnn ai project to classify which food people are eating on audio recordings

Deep Learning - EAT Challenge About This project is part of an AI challenge of the DeepLearning course 2021 at the University of Augsburg. The objecti

Marco Tröster 1 Oct 24, 2021
Relative Uncertainty Learning for Facial Expression Recognition

Relative Uncertainty Learning for Facial Expression Recognition The official implementation of the following paper at NeurIPS2021: Title: Relative Unc

35 Dec 28, 2022
Generalized Data Weighting via Class-level Gradient Manipulation

Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas

18 Nov 12, 2022
Watch faces morph into each other with StyleGAN 2, StyleGAN, and DCGAN!

FaceMorpher FaceMorpher is an innovative project to get a unique face morph (or interpolation for geeks) on a website. Yes, this means you can see fac

Anish 9 Jun 24, 2022
Mouse Brain in the Model Zoo

Deep Neural Mouse Brain Modeling This is the repository for the ongoing deep neural mouse modeling project, an attempt to characterize the representat

Colin Conwell 15 Aug 22, 2022
Deep Ensemble Learning with Jet-Like architecture

Ransomware analysis using DEL with jet-like architecture comprising two CNN wings, a sparse AE tail, a non-linear PCA to produce a diverse feature space, and an MLP nose

Ahsen Nazir 2 Feb 06, 2022