No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

Last update: Dec 30, 2022

Related tags

Deep Learning TReS

Overview

This repository contains the implementation for the paper:

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency (WACV 2022) Video

Creat Environment

This code is train and test on Ubuntu 16.04 while using Anaconda, python 3.6.6, and pytorch 1.8.0. To set up the evironment run: conda env create -f environment.yml after installing the virtuall env you should be able to run python -c "import torch; print(torch.__version__)" in the terminal and see 1.8.0

Datasets

In this work we use 7 datasets for evaluation (LIVE, CSIQ, TID2013, KADID10K, CLIVE, KonIQ, LIVEFB)

To start training please make sure to follow the correct folder structure for each of the aformentioned datasets as provided bellow:

LIVE

live
    |--fastfading
    |    |  ...     
    |--blur
    |    |  ... 
    |--jp2k
    |    |  ...     
    |--jpeg
    |    |  ...     
    |--wn
    |    |  ...     
    |--refimgs
    |    |  ...     
    |--dmos.mat
    |--dmos_realigned.mat
    |--refnames_all.mat
    |--readme.txt

CSIQ

csiq
    |--dst_imgs_all
    |    |--1600.AWGN.1.png
    |    |  ... (you need to put all the distorted images here)
    |--src_imgs
    |    |--1600.png
    |    |  ...
    |--csiq.DMOS.xlsx
    |--csiq_label.txt

TID2013

tid2013
    |--distorted_images
    |--reference_images
    |--mos.txt
    |--mos_std.txt
    |--mos_with_names.txt
    |--readme

KADID10K

kadid10k
    |--distorted_images
    |    |--I01_01_01.png
    |    |  ...    
    |--reference_images
    |    |--I01.png
    |    |  ...    
    |--dmos.csv
    |--mv.sh.save
    |--mvv.sh

CLIVE

clive
    |--Data
    |    |--I01_01_01.png
    |    |  ...    
    |--Images
    |    |--I01.png
    |    |  ...    
    |--ChallengeDB_release
    |    |--README.txt
    |--dmos.csv
    |--mv.sh.save
    |--mvv.sh

KonIQ

fblive
   |--1024x768
   |    |  992920521.jpg 
   |    |  ... (all the images should be here)     
   |--koniq10k_scores_and_distributions.csv

LIVEFB

fblive
   |--FLIVE
   |    |  AVA__149.jpg    
   |    |  ... (all the images should be here)     
   |--labels_image.csv

Training

The training scrips are provided in the run.sh. Please change the paths correspondingly. Please note that to achive the same performace the parameters should match the ones in the run.sh files.

Pretrained models

The pretrain models are provided here.

Acknowledgement

This code is borrowed parts from HyperIQA and DETR.

FAQs

What is the difference between self-consistency and ensembling? and will the self-consistency increase the interface time?

In ensampling methods, we need to have several models (with different initializations) and ensemble the results during the training and testing, but in our self-consistency model, we enforce one model to have consistent performance for one network during the training while the network has an input with different transformations. Our self-consistency model has the same interface time/parameters in the testing similar to the model without self-consistency. In other words, we are not adding any new parameters to the network and it won't affect the interface.

What is the difference between self-consistency and augmentation?

In augmentation, we augment an input and send it to one network, so although the network will become robust to different augmentation, it will never have the chance of enforcing the outputs to be the same for different versions of an input at the same time. In our self-consistency approach, we force the network to have a similar output for an image with a different transformation (in our case horizontal flipping) which leads to more robust performance. Please also note that we still use augmentation during the training, so our model is benefiting from the advantages of both augmentation and self-consistency. Also, please see Fig. 1 in the main paper, where we showed that models that used augmentation alone are sensitive to simple transformations.

Why does the relative ranking loss apply to the samples with the highest and lowest quality scores, why not applying it to all the samples?

1) We did not see a significant improvement by applying our ranking loss to all the samples within each batch compared to the case that we just use extreme cases. 2) Considering more samples lead to more gradient back-propagation and therefore more computation during the training which causes slower training.

Citation

If you find this work useful for your research, please cite our paper:

@InProceedings{golestaneh2021no,
  title={No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency},
  author={Golestaneh, S Alireza and Dadsetan, Saba and Kitani, Kris M},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={3209--3218},
  year={2022}
}

If you have any questions about our work, please do not hesitate to contact [email protected]

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

Related tags

Overview

Creat Environment

Datasets

Training

Pretrained models

Acknowledgement

FAQs

Citation

Owner

Alireza Golestaneh

Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

HyperCube: Implicit Field Representations of Voxelized 3D Models

AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

(ICCV 2021 Oral) Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation.

Simple transformer model for CIFAR10

Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models.

Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation

Official repository of the paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

Direct application of DALLE-2 to video synthesis, using factored space-time Unet and Transformers

A PyTorch Implementation of FaceBoxes

Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning

I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive constraining

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight)

Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions

This repository contains the code for the binaural-detection model used in the publication arXiv:2111.04637

Code for "Long Range Probabilistic Forecasting in Time-Series using High Order Statistics"

Clustering is a popular approach to detect patterns in unlabeled data

Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.