Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Last update: Dec 21, 2022

Overview

Receptive Field Block Net for Accurate and Fast Object Detection

By Songtao Liu, Di Huang, Yunhong Wang

Updatas (2021/07/23): YOLOX is here!, stronger YOLO with ONNX, TensorRT, ncnn, and OpenVino supported!!

Updates: we propose a new method to get 42.4 mAP at 45 FPS on COCO, code is available here

Introduction

Inspired by the structure of Receptive Fields (RFs) in human visual systems, we propose a novel RF Block (RFB) module, which takes the relationship between the size and eccentricity of RFs into account, to enhance the discriminability and robustness of features. We further assemble the RFB module to the top of SSD with a lightweight CNN model, constructing the RFB Net detector. You can use the code to train/evaluate the RFB Net for object detection. For more details, please refer to our ECCV paper.

VOC2007 Test

System	mAP	FPS (Titan X Maxwell)
Faster R-CNN (VGG16)	73.2	7
YOLOv2 (Darknet-19)	78.6	40
R-FCN (ResNet-101)	80.5	9
SSD300* (VGG16)	77.2	46
SSD512* (VGG16)	79.8	19
RFBNet300 (VGG16)	80.7	83
RFBNet512 (VGG16)	82.2	38

COCO

System	test-dev mAP	Time (Titan X Maxwell)
Faster R-CNN++ (ResNet-101)	34.9	3.36s
YOLOv2 (Darknet-19)	21.6	25ms
SSD300* (VGG16)	25.1	22ms
SSD512* (VGG16)	28.8	53ms
RetinaNet500 (ResNet-101-FPN)	34.4	90ms
RFBNet300 (VGG16)	30.3	15ms
RFBNet512 (VGG16)	33.8	30ms
RFBNet512-E (VGG16)	34.4	33ms

MobileNet

System	COCO minival mAP	#parameters
SSD MobileNet	19.3	6.8M
RFB MobileNet	20.7	7.4M

Citing RFB Net

Please cite our paper in your publications if it helps your research:

@InProceedings{Liu_2018_ECCV,
author = {Liu, Songtao and Huang, Di and Wang, andYunhong},
title = {Receptive Field Block Net for Accurate and Fast Object Detection},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}

Installation
Datasets
Training
Evaluation
Models

Installation

Install PyTorch-0.4.0 by selecting your environment on the website and running the appropriate command.
Clone this repository. This repository is mainly based on ssd.pytorch and Chainer-ssd, a huge thank to them.
- Note: We currently only support PyTorch-0.4.0 and Python 3+.
Compile the nms and coco tools:

./make.sh

Note: Check you GPU architecture support in utils/build.py, line 131. Default is:

'nvcc': ['-arch=sm_52',

Then download the dataset by following the instructions below and install opencv.

conda install opencv

Note: For training, we currently support VOC and COCO.

Datasets

To make things easy, we provide simple VOC and COCO dataset loader that inherits torch.utils.data.Dataset making it fully compatible with the torchvision.datasets API.

VOC Dataset

Download VOC2007 trainval & test

# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2007.sh # <directory>

Download VOC2012 trainval

# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2012.sh # <directory>

COCO Dataset

Install the MS COCO dataset at /path/to/coco from official website, default is ~/data/COCO. Following the instructions to prepare minival2014 and valminusminival2014 annotations. All label files (.json) should be under the COCO/annotations/ folder. It should have this basic structure

$COCO/
$COCO/cache/
$COCO/annotations/
$COCO/images/
$COCO/images/test2015/
$COCO/images/train2014/
$COCO/images/val2014/

UPDATE: The current COCO dataset has released new train2017 and val2017 sets which are just new splits of the same image sets.

Training

First download the fc-reduced VGG-16 PyTorch base network weights at: https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth or from our BaiduYun Driver
MobileNet pre-trained basenet is ported from MobileNet-Caffe, which achieves slightly better accuracy rates than the original one reported in the paper, weight file is available at: https://drive.google.com/open?id=13aZSApybBDjzfGIdqN1INBlPsddxCK14 or BaiduYun Driver.
By default, we assume you have downloaded the file in the RFBNet/weights dir:

mkdir weights
cd weights
wget https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth

To train RFBNet using the train script simply specify the parameters listed in train_RFB.py as a flag or manually change them.

python train_RFB.py -d VOC -v RFB_vgg -s 300

Note:
- -d: choose datasets, VOC or COCO.
- -v: choose backbone version, RFB_VGG, RFB_E_VGG or RFB_mobile.
- -s: image size, 300 or 512.
- You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see train_RFB.py for options)
- If you want to reproduce the results in the paper, the VOC model should be trained about 240 epoches while the COCO version need 130 epoches.

Evaluation

To evaluate a trained network:

python test_RFB.py -d VOC -v RFB_vgg -s 300 --trained_model /path/to/model/weights

By default, it will directly output the mAP results on VOC2007 test or COCO minival2014. For VOC2012 test and COCO test-dev results, you can manually change the datasets in the test_RFB.py file, then save the detection results and submitted to the server.

Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Related tags

Overview

Receptive Field Block Net for Accurate and Fast Object Detection

Updatas (2021/07/23): YOLOX is here!, stronger YOLO with ONNX, TensorRT, ncnn, and OpenVino supported!!

Updates: we propose a new method to get 42.4 mAP at 45 FPS on COCO, code is available here

Introduction

VOC2007 Test

COCO

MobileNet

Citing RFB Net

Contents

Installation

Datasets

VOC Dataset

Download VOC2007 trainval & test

Download VOC2012 trainval

COCO Dataset

Training

Evaluation

Models

Owner

Liu Songtao

[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

NOMAD - A blackbox optimization software

Editing a Conditional Radiance Field

PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021.

Complete system for facial identity system

🇰🇷 Text to Image in Korean

Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

This repository contains code to train and render Mixture of Volumetric Primitives (MVP) models

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

In this project, we develop a face recognize platform based on MTCNN object-detection netcwork and FaceNet self-supervised network.

For visualizing the dair-v2x-i dataset

Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

On-device wake word detection powered by deep learning.

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

A repo to show how to use custom dataset to train s2anet, and change backbone to resnext101

CLDF dataset derived from Robbeets et al.'s "Triangulation Supports Agricultural Spread" from 2021

PyTorch Implementation of Sparse DETR

Music library streaming app written in Flask & VueJS