We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Last update: Nov 08, 2022

Related tags

Overview

ConTNet

Introduction

ConTNet (Convlution-Tranformer Network) is proposed mainly in response to the following two issues: (1) ConvNets lack a large receptive field, limiting the performance of ConvNets on downstream tasks. (2) Transformer-based model is not robust enough and requires special training settings or hundreds of millions of images as the pretrain dataset, thereby limiting their adoption. ConTNet combines convolution and transformer alternately, which is very robust and can be optimized like ResNet unlike the recently-proposed transformer-based models (e.g., ViT, DeiT) that are sensitive to hyper-parameters and need many tricks when trained from scratch on a midsize dataset (e.g., ImageNet).

Main Results on ImageNet

name	resolution	[email protected]	#params(M)	FLOPs(G)
Res-18	224x224	71.5	11.7	1.8
ConT-S	224x224	74.9	10.1	1.5
Res-50	224x224	77.1	25.6	4.0
ConT-M	224x224	77.6	19.2	3.1
Res-101	224x224	78.2	44.5	7.6
ConT-B	224x224	77.9	39.6	6.4
DeiT-Ti^*	224x224	72.2	5.7	1.3
ConT-Ti^*	224x224	74.9	5.8	0.8
Res-18^*	224x224	73.2	11.7	1.8
ConT-S^*	224x224	76.5	10.1	1.5
Res-50^*	224x224	78.6	25.6	4.0
DeiT-S^*	224x224	79.8	22.1	4.6
ConT-M^*	224x224	80.2	19.2	3.1
Res-101^*	224x224	80.0	44.5	7.6
DeiT-B^*	224x224	81.8	86.6	17.6
ConT-B^*	224x224	81.8	39.6	6.4

Note: ^* indicates training with strong augmentations.

Main Results on Downstream Tasks

Object detection results on COCO.

method	backbone	#params(M)	FLOPs(G)	AP	APs	APm	APl
RetinaNet	Res-50 ConTNet-M	32.0 27.0	235.6 217.2	36.5 37.9	20.4 23.0	40.3 40.6	48.1 50.4
FCOS	Res-50 ConTNet-M	32.2 27.2	242.9 228.4	38.7 40.8	22.9 25.1	42.5 44.6	50.1 53.0
faster rcnn	Res-50 ConTNet-M	41.5 36.6	241.0 225.6	37.4 40.0	21.2 25.4	41.0 43.0	48.1 52.0

Instance segmentation results on Cityscapes based on Mask-RCNN.

backbone	AP^bb	AP_s^bb	AP_m^bb	AP_l^bb	AP^mk	AP_s^mk	AP_m^mk	AP_l^mk
Res-50 ConT-M	38.2 40.5	21.9 25.1	40.9 44.4	49.5 52.7	34.7 38.1	18.3 20.9	37.4 41.0	47.2 50.3

Semantic segmentation results on cityscapes.

model	mIOU
PSP-Res50	77.12
PSP-ConTM	78.28

Bib Citing

@article{yan2021contnet,
    title={ConTNet: Why not use convolution and transformer at the same time?},
    author={Haotian Yan and Zhe Li and Weijian Li and Changhu Wang and Ming Wu and Chuang Zhang},
    year={2021},
    journal={arXiv preprint arXiv:2104.13497}
}

We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Related tags

Overview

ConTNet

Introduction

Main Results on ImageNet

Main Results on Downstream Tasks

Bib Citing

Owner

QR2Pass-project - A proof of concept for an alternative (passwordless) authentication system to a web server

Contenido del curso Bases de datos del DCC PUC versión 2021-2

Unofficial Implementation of MLP-Mixer in TensorFlow

My freqtrade strategies

Physics-informed Neural Operator for Learning Partial Differential Equation

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Deep Distributed Control of Port-Hamiltonian Systems

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

Learning a mapping from images to psychological similarity spaces with neural networks.

BARTScore: Evaluating Generated Text as Text Generation

Activity tragle - Google is tracking everything, we just look at it

Dual Attention Network for Scene Segmentation (CVPR2019)

Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space

Vertex AI: Serverless framework for MLOPs (ESP / ENG)

Dense Gaussian Processes for Few-Shot Segmentation

PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

A novel pipeline framework for multi-hop complex KGQA task. About the paper title: Improving Multi-hop Embedded Knowledge Graph Question Answering by Introducing Relational Chain Reasoning

The VeriNet toolkit for verification of neural networks

EgoNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".