Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

Last update: Dec 11, 2022

Related tags

Deep Learning ViP

Overview

Visual Parser (ViP)

This is the official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers.

Key Features & TLDR

PyTorch Implementation of the ViP network. Check it out at models/vip.py
A fast and neat implementation of the relative positional encoding proposed in HaloNet, BOTNet and AANet.
A transformer-friendly FLOPS & Param counter that supports FLOPS calculation for einsum and matmul operations.

Prerequisite

Please refer to get_started.md.

Results and Models

All models listed below are evaluated with input size 224x224

Model	Top1 Acc	#params	FLOPS	Download
ViP-Tiny	79.0	12.8M	1.7G	Google Drive
ViP-Small	82.1	32.1M	4.5G	Google Drive
ViP-Medium	83.3	49.6M	8.0G	Coming Soon
ViP-Base	83.6	87.8M	15.0G	Coming Soon

To load the pretrained checkpoint, e.g. ViP-Tiny, simply run:

# first download the checkpoint and name it as vip_t_dict.pth
from models.vip import vip_tiny
model = vip_tiny(pretrained="vip_t_dict.pth")

Evaluation

To evaluate a pre-trained ViP on ImageNet val, run:

python3 main.py <data-root> --model <model-name> -b <batch-size> --eval_checkpoint <path-to-checkpoint>

Training from scratch

To train a ViP on ImageNet from scratch, run:

bash ./distributed_train.sh <job-name> <config-path> <num-gpus>

For example, to train ViP with 8 GPU on a single node, run:

ViP-Tiny:

bash ./distributed_train.sh vip-t-001 configs/vip_t_bs1024.yaml 8

ViP-Small:

bash ./distributed_train.sh vip-s-001 configs/vip_s_bs1024.yaml 8

ViP-Medium:

bash ./distributed_train.sh vip-m-001 configs/vip_m_bs1024.yaml 8

ViP-Base:

bash ./distributed_train.sh vip-b-001 configs/vip_b_bs1024.yaml 8

Profiling the model

To measure the throughput, run:

python3 test_throughput.py <model-name>

For example, if you want to get the test speed of Vip-Tiny on your device, run:

python3 test_throughput.py vip-tiny

To measure the FLOPS and number of parameters, run:

python3 test_flops.py <model-name>

Citing ViP

@article{vip,
  title={Visual Parser: Representing Part-whole Hierarchies with Transformers},
  author={Sun, Shuyang and Yue, Xiaoyu, Bai, Song and Torr, Philip},
  journal={arXiv preprint arXiv:2107.05790},
  year={2021}
}

Contact

If you have any questions, don't hesitate to contact Shuyang (Kevin) Sun. You can easily reach him by sending an email to [email protected].

Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

Related tags

Overview

Visual Parser (ViP)

Key Features & TLDR

Prerequisite

Results and Models

Evaluation

Training from scratch

Profiling the model

Citing ViP

Contact

Owner

Shuyang Sun

The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classiﬁer')

Captcha-tensorflow - Image Captcha Solving Using TensorFlow and CNN Model. Accuracy 90%+

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Malware Bypass Research using Reinforcement Learning

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA results for single-image motion deblurring, image deraining, image denoising (synthetic and real data), and dual-pixel defocus deblurring.

Voice control for Garry's Mod

Artifacts for paper "MMO: Meta Multi-Objectivization for Software Configuration Tuning"

Code for the paper "Improved Techniques for Training GANs"

Self-Supervised depth kalilia

The code for replicating the experiments from the LFI in SSMs with Unknown Dynamics paper.

A PyTorch implementation of EfficientNet and EfficientNetV2 (coming soon!)

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

TrTr: Visual Tracking with Transformer

YOLOv5 detection interface - PyQt5 implementation

Implementation of paper: "Image Super-Resolution Using Dense Skip Connections" in PyTorch

Exploration & Research into cross-domain MEV. Initial focus on ETH/POLYGON.

Generate high quality pictures. GAN. Generative Adversarial Networks

DiSECt: Differentiable Simulator for Robotic Cutting

Scales, Chords, and Cadences: Practical Music Theory for MIR Researchers

Tensorflow/Keras Plug-N-Play Deep Learning Models Compilation