Pyramid Pooling Transformer for Scene Understanding

Related tags

Deep LearningP2T
Overview

Pyramid Pooling Transformer for Scene Understanding

Requirements:

  • torch 1.6+
  • torchvision 0.7.0
  • timm==0.3.2
  • Validated on torch 1.6.0, torchvision 0.7.0

Models Pretrained on ImageNet1K

Variants Input Size [email protected] [email protected] #Params (M) Pretrained Models
P2T-Tiny 224 x 224 78.1 94.1 11.1 Google Drive
P2T-Small 224 x 224 82.1 95.9 23.0 Google Drive
P2T-Base 224 x 224 83.0 96.2 36.2 Google Drive

Pretrained Models for Downstream tasks

To be updated.

Something Else

Note: we have prepared a stronger version of P2T. Since P2T is still in peer review, we will release the stronger P2T after the acceptance.

You might also like...
 Neural Scene Graphs for Dynamic Scene (CVPR 2021)
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Code for
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co

This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.
This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order

Compact Bilinear Pooling for PyTorch

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

Comments
  • How to load ImageNet1K pretrained weight to semantic segmentation model?

    How to load ImageNet1K pretrained weight to semantic segmentation model?

    Hello, thanks for open source!

    I use mmseg, and load weight from image classification result, it warns: WARNING - The model and loaded state dict do not match exactly missing keys in source state_dict: backbone.head.weight, backbone.head.bias unexpected key in source state_dict: cls_token, ln1.bias, ln1.weight, layers.0.ln1.bias, layers.0.ln1.weight, layers.0.ln2.bias, layers.0.ln2.weight, layers.0.ffn.layers.0.0.bias, layers.0.ffn.layers.0.0.weight, layers.0.ffn.layers.1.bias, layers.0.ffn.layers.1.weight, layers.0.attn.attn.out_proj.bias, layers.0.attn.attn.out_proj.weight, layers.0.attn.attn.in_proj_bias, layers.0.attn.attn.in_proj_weight, layers.1.ln1.bias, layers.1.ln1.weight, layers.1.ln2.bias, layers.1.ln2.weight, layers.1.ffn.layers.0.0.bias, layers.1.ffn.layers.0.0.weight, layers.1.ffn.layers.1.bias, layers.1.ffn.layers.1.weight, layers.1.attn.attn.out_proj.bias, layers.1.attn.attn.out_proj.weight ...... And the experimental results are terrible as the experiments initialize weight with random.

    So I load weight from ADE20K result, it work and warns: WARNING - The model and loaded state dict do not match exactly missing keys in source state_dict: backbone.head.weight, backbone.head.bias And the result is similar to the result you offer.

    Which weight should I load? ImageNet1K or ADE20K? Or should I modify the keys of weight in ImageNet1K to adapt the key in segmentation?

    opened by asd123pwj 8
  • Questions about your ablation studies

    Questions about your ablation studies

    Hello,

    I have some questions about your ablation studies of pyramid pooling. Could you detail about your baseline version in Table 9? First, you say that you replace P-MHSA with an MHSA with a single pooling operation, what is the detail about single pooling operation? Ex: Pooling Ratios? Second, do you compared your method with original MHSA?

    opened by pp00704831 3
  • P2T replaces PVT trunk bug

    P2T replaces PVT trunk bug

    When I replaced the PVT trunk with P2T in my code, I encountered an error :
    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 512, 3, 3]], which is output 0 of AdaptiveAvgPool2DBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

    opened by liu-tianxiang 2
  • P2T on ImageNet-22K?

    P2T on ImageNet-22K?

    Hi @yuhuan-wu , thank you for share the code of this excellent work! Have you trained P2T on ImageNet-22K dataset or any further plan to do it? If so, could you please share the pretrained model on ImageNet-22k?

    Thank you.

    opened by fyaft2012 1
Owner
Yu-Huan Wu
Ph.D. student at Nankai University
Yu-Huan Wu
UMich 500-Level Mobile Robotics Course

MOBILE ROBOTICS: METHODS & ALGORITHMS - WINTER 2022 University of Michigan - NA 568/EECS 568/ROB 530 For slides, lecture notes, and example codes, see

393 Dec 29, 2022
Bottom-up Human Pose Estimation

Introduction This is the official code of Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation. This paper has been accepted to CVPR2

108 Dec 01, 2022
Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Seulki Park 70 Jan 03, 2023
Meandering In Networks of Entities to Reach Verisimilar Answers

MINERVA Meandering In Networks of Entities to Reach Verisimilar Answers Code and models for the paper Go for a Walk and Arrive at the Answer - Reasoni

Shehzaad Dhuliawala 271 Dec 13, 2022
toroidal - a lightweight transformer library for PyTorch

toroidal - a lightweight transformer library for PyTorch Toroidal transformers are of smaller size and lower weight than the more common E-I types. Th

MathInf GmbH 64 Jan 07, 2023
Official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning (ICML 2021) published at International Conference on Machine Learning

About This repository the official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning. The config files contain the s

Dynamic Vision and Learning Group 41 Dec 10, 2022
Fuzzy Overclustering (FOC)

Fuzzy Overclustering (FOC) In real-world datasets, we need consistent annotations between annotators to give a certain ground-truth label. However, in

2 Nov 08, 2022
Pytorch implementation of the Variational Recurrent Neural Network (VRNN).

VariationalRecurrentNeuralNetwork Pytorch implementation of the Variational RNN (VRNN), from A Recurrent Latent Variable Model for Sequential Data. Th

emmanuel 251 Dec 17, 2022
Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

CARLA-Roach This is the official code release of the paper End-to-End Urban Driving by Imitating a Reinforcement Learning Coach by Zhejun Zhang, Alexa

Zhejun Zhang 118 Dec 28, 2022
Unofficial PyTorch code for BasicVSR

Dependencies and Installation The code is based on BasicSR, Please install the BasicSR framework first. Pytorch=1.51 Training cd ./code CUDA_VISIBLE_

Long 59 Dec 06, 2022
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
Lecture materials for Cornell CS5785 Applied Machine Learning (Fall 2021)

Applied Machine Learning (Cornell CS5785, Fall 2021) This repo contains executable course notes and slides for the Applied ML course at Cornell and Co

Volodymyr Kuleshov 103 Dec 31, 2022
RRxIO - Robust Radar Visual/Thermal Inertial Odometry: Robust and accurate state estimation even in challenging visual conditions.

RRxIO - Robust Radar Visual/Thermal Inertial Odometry RRxIO offers robust and accurate state estimation even in challenging visual conditions. RRxIO c

Christopher Doer 64 Dec 29, 2022
PyTorch implementation of normalizing flow models

PyTorch implementation of normalizing flow models

Vincent Stimper 242 Jan 02, 2023
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 190 Dec 30, 2022
Volumetric parameterization of the placenta to a flattened template

placenta-flattening A MATLAB algorithm for volumetric mesh parameterization. Developed for mapping a placenta segmentation derived from an MRI image t

Mazdak Abulnaga 12 Mar 14, 2022
Toontown House CT Edition

Toontown House: Classic Toontown House Classic source that should just work. ❓ W

Open Source Toontown Servers 5 Jan 09, 2022
Title: Graduate-Admissions-Predictor

The purpose of this project is create a predictive model capable of identifying the probability of a person securing an admit based on their personal profile parameters. Simplified visualisations hav

Akarsh Singh 1 Jan 26, 2022
Plenoxels: Radiance Fields without Neural Networks, Code release WIP

Plenoxels: Radiance Fields without Neural Networks Alex Yu*, Sara Fridovich-Keil*, Matthew Tancik, Qinhong Chen, Benjamin Recht, Angjoo Kanazawa UC Be

Alex Yu 2.3k Dec 30, 2022
A Real-Time-Strategy game for Deep Learning research

Description DeepRTS is a high-performance Real-TIme strategy game for Reinforcement Learning research. It is written in C++ for performance, but provi

Centre for Artificial Intelligence Research (CAIR) 156 Dec 19, 2022