Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

Overview

RTM3D-PyTorch

python-image pytorch-image

The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020)


Demonstration

demo

Features

  • Realtime 3D object detection based on a monocular RGB image
  • Support distributed data parallel training
  • Tensorboard
  • ResNet-based Keypoint Feature Pyramid Network (KFPN) (Using by setting --arch fpn_resnet_18)
  • Use images from both left and right cameras (Control by setting the use_left_cam_prob argument)
  • Release pre-trained models

Some modifications from the paper

  • Formula (3):

    • A negative value can't be an input of the log operator, so please don't normalize dim as mentioned in the paper because the normalized dim values maybe less than 0. Hence I've directly regressed to absolute dimension values in meters.
    • Use L1 loss for depth estimation (applying the sigmoid activation to the depth output first).
  • Formula (5): I haven't taken the absolute values of the ground-truth, I have used the relative values instead. The code is here

  • Formula (7): argmin instead of argmax

  • Generate heatmap for the center and vertexes of objects as the CenterNet paper. If you want to use the strategy from RTM3D paper, you can pass the dynamic-sigma argument to the train.py script.

2. Getting Started

2.1. Requirement

pip install -U -r requirements.txt

2.2. Data Preparation

Download the 3D KITTI detection dataset from here.

The downloaded data includes:

  • Training labels of object data set (5 MB)
  • Camera calibration matrices of object data set (16 MB)
  • Left color images of object data set (12 GB)
  • Right color images of object data set (12 GB)

Please make sure that you construct the source code & dataset directories structure as below.

2.3. RTM3D architecture

architecture

The model takes only the RGB images as the input and outputs the main center heatmap, vertexes heatmap, and vertexes coordinate as the base module to estimate 3D bounding box.

2.4. How to run

2.4.1. Visualize the dataset

cd src/data_process
  • To visualize camera images with 3D boxes, let's execute:
python kitti_dataset.py

Then Press n to see the next sample >>> Press Esc to quit...

2.4.2. Inference

Download the trained model from here (will be released), then put it to ${ROOT}/checkpoints/ and execute:

python test.py --gpu_idx 0 --arch resnet_18 --pretrained_path ../checkpoints/rtm3d_resnet_18.pth

2.4.3. Evaluation

python evaluate.py --gpu_idx 0 --arch resnet_18 --pretrained_path <PATH>

2.4.4. Training

2.4.4.1. Single machine, single gpu
python train.py --gpu_idx 0 --arch <ARCH> --batch_size <N> --num_workers <N>...
2.4.4.2. Multi-processing Distributed Data Parallel Training

We should always use the nccl backend for multi-processing distributed training since it currently provides the best distributed training performance.

  • Single machine (node), multiple GPUs
python train.py --dist-url 'tcp://127.0.0.1:29500' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0
  • Two machines (two nodes), multiple GPUs

First machine

python train.py --dist-url 'tcp://IP_OF_NODE1:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 0

Second machine

python train.py --dist-url 'tcp://IP_OF_NODE2:FREEPORT' --dist-backend 'nccl' --multiprocessing-distributed --world-size 2 --rank 1

To reproduce the results, you can run the bash shell script

./train.sh

Tensorboard

  • To track the training progress, go to the logs/ folder and
cd logs/<saved_fn>/tensorboard/
tensorboard --logdir=./

Contact

If you think this work is useful, please give me a star!
If you find any errors or have any suggestions, please contact me (Email: [email protected]).
Thank you!

Citation

@article{RTM3D,
  author = {Peixuan Li,  Huaici Zhao, Pengfei Liu, Feidao Cao},
  title = {RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving},
  year = {2020},
  conference = {ECCV 2020},
}
@misc{RTM3D-PyTorch,
  author =       {Nguyen Mau Dung},
  title =        {{RTM3D-PyTorch: PyTorch Implementation of the RTM3D paper}},
  howpublished = {\url{https://github.com/maudzung/RTM3D-PyTorch}},
  year =         {2020}
}

References

[1] CenterNet: Objects as Points paper, PyTorch Implementation

Folder structure

${ROOT}
└── checkpoints/    
    ├── rtm3d_resnet_18.pth
    ├── rtm3d_fpn_resnet_18.pth
└── dataset/    
    └── kitti/
        ├──ImageSets/
        │   ├── test.txt
        │   ├── train.txt
        │   └── val.txt
        ├── training/
        │   ├── image_2/ (left color camera)
        │   ├── image_3/ (right color camera)
        │   ├── calib/
        │   ├── label_2/
        └── testing/  
        │   ├── image_2/ (left color camera)
        │   ├── image_3/ (right color camera)
        │   ├── calib/
        └── classes_names.txt
└── src/
    ├── config/
    │   ├── train_config.py
    │   └── kitti_config.py
    ├── data_process/
    │   ├── kitti_dataloader.py
    │   ├── kitti_dataset.py
    │   └── kitti_data_utils.py
    ├── models/
    │   ├── fpn_resnet.py
    │   ├── resnet.py
    │   ├── model_utils.py
    └── utils/
    │   ├── evaluation_utils.py
    │   ├── logger.py
    │   ├── misc.py
    │   ├── torch_utils.py
    │   ├── train_utils.py
    ├── evaluate.py
    ├── test.py
    ├── train.py
    └── train.sh
├── README.md 
└── requirements.txt

Usage

usage: train.py [-h] [--seed SEED] [--saved_fn FN] [--root-dir PATH]
                [--arch ARCH] [--pretrained_path PATH] [--head_conv HEAD_CONV]
                [--hflip_prob HFLIP_PROB]
                [--use_left_cam_prob USE_LEFT_CAM_PROB] [--dynamic-sigma]
                [--no-val] [--num_samples NUM_SAMPLES]
                [--num_workers NUM_WORKERS] [--batch_size BATCH_SIZE]
                [--print_freq N] [--tensorboard_freq N] [--checkpoint_freq N]
                [--start_epoch N] [--num_epochs N] [--lr_type LR_TYPE]
                [--lr LR] [--minimum_lr MIN_LR] [--momentum M] [-wd WD]
                [--optimizer_type OPTIMIZER] [--steps [STEPS [STEPS ...]]]
                [--world-size N] [--rank N] [--dist-url DIST_URL]
                [--dist-backend DIST_BACKEND] [--gpu_idx GPU_IDX] [--no_cuda]
                [--multiprocessing-distributed] [--evaluate]
                [--resume_path PATH] [--K K]

The Implementation of RTM3D using PyTorch

optional arguments:
  -h, --help            show this help message and exit
  --seed SEED           re-produce the results with seed random
  --saved_fn FN         The name using for saving logs, models,...
  --root-dir PATH       The ROOT working directory
  --arch ARCH           The name of the model architecture
  --pretrained_path PATH
                        the path of the pretrained checkpoint
  --head_conv HEAD_CONV
                        conv layer channels for output head0 for no conv
                        layer-1 for default setting: 64 for resnets and 256
                        for dla.
  --hflip_prob HFLIP_PROB
                        The probability of horizontal flip
  --use_left_cam_prob USE_LEFT_CAM_PROB
                        The probability of using the left camera
  --dynamic-sigma       If true, compute sigma based on Amax, Amin then
                        generate heamapIf false, compute radius as CenterNet
                        did
  --no-val              If true, dont evaluate the model on the val set
  --num_samples NUM_SAMPLES
                        Take a subset of the dataset to run and debug
  --num_workers NUM_WORKERS
                        Number of threads for loading data
  --batch_size BATCH_SIZE
                        mini-batch size (default: 16), this is the totalbatch
                        size of all GPUs on the current node when usingData
                        Parallel or Distributed Data Parallel
  --print_freq N        print frequency (default: 50)
  --tensorboard_freq N  frequency of saving tensorboard (default: 50)
  --checkpoint_freq N   frequency of saving checkpoints (default: 5)
  --start_epoch N       the starting epoch
  --num_epochs N        number of total epochs to run
  --lr_type LR_TYPE     the type of learning rate scheduler (cosin or
                        multi_step)
  --lr LR               initial learning rate
  --minimum_lr MIN_LR   minimum learning rate during training
  --momentum M          momentum
  -wd WD, --weight_decay WD
                        weight decay (default: 1e-6)
  --optimizer_type OPTIMIZER
                        the type of optimizer, it can be sgd or adam
  --steps [STEPS [STEPS ...]]
                        number of burn in step
  --world-size N        number of nodes for distributed training
  --rank N              node rank for distributed training
  --dist-url DIST_URL   url used to set up distributed training
  --dist-backend DIST_BACKEND
                        distributed backend
  --gpu_idx GPU_IDX     GPU index to use.
  --no_cuda             If true, cuda is not used.
  --multiprocessing-distributed
                        Use multi-processing distributed training to launch N
                        processes per node, which has N GPUs. This is the
                        fastest way to use PyTorch for either single node or
                        multi node data parallel training
  --evaluate            only evaluate the model, not training
  --resume_path PATH    the path of the resumed checkpoint
  --K K                 the number of top K
Owner
Nguyen Mau Dzung
M.Sc. in HCI & Robotics | Self-driving Car Engineer | Senior AI Engineer | Interested in 3D Computer Vision
Nguyen Mau Dzung
A mini-course offered to Undergrad chemistry students

The best way to use this material is by forking it by click the Fork button at the top, right corner. Then you will get your own copy to play with! Th

Raghu 19 Dec 19, 2022
Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Pytorch Code for VideoLT [Website][Paper] Updates [10/29/2021] Features uploaded to Google Drive, for access please send us an e-mail: zhangxing18 at

Skye 26 Sep 18, 2022
An open-source outlier detection package by Getcontact Data Team

pyfbad The pyfbad library supports anomaly detection projects. An end-to-end anomaly detection application can be written using the source codes of th

Teknasyon Tech 41 Dec 27, 2022
This repository contains code released by Google Research.

This repository contains code released by Google Research.

Google Research 26.6k Dec 31, 2022
You can draw the corresponding bounding box into the image and save it according to the result file (txt format) run by the tracker.

You can draw the corresponding bounding box into the image and save it according to the result file (txt format) run by the tracker.

Huiyiqianli 42 Dec 06, 2022
Head and Neck Tumour Segmentation and Prediction of Patient Survival Project

Head-and-Neck-Tumour-Segmentation-and-Prediction-of-Patient-Survival Welcome to the Head and Neck Tumour Segmentation and Prediction of Patient Surviv

5 Oct 20, 2022
Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

Bo Pang 27 Mar 30, 2022
PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA

PyTorch Code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation Viraj Prabhu, Shivam Khare, Deeks

Viraj Prabhu 46 Dec 24, 2022
Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter This is a pytorch-based implementation for paper Implicit Feature Alignme

wangtianwei 61 Nov 12, 2022
alfred-py: A deep learning utility library for **human**

Alfred Alfred is command line tool for deep-learning usage. if you want split an video into image frames or combine frames into a single video, then a

JinTian 800 Jan 03, 2023
Txt2Xml tool will help you convert from txt COCO format to VOC xml format in Object Detection Problem.

TXT 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Txt2Xml too

Nguyễn Trường Lâu 4 Nov 24, 2022
Image process framework based on plugin like imagej, it is esay to glue with scipy.ndimage, scikit-image, opencv, simpleitk, mayavi...and any libraries based on numpy

Introduction ImagePy is an open source image processing framework written in Python. Its UI interface, image data structure and table data structure a

ImagePy 1.2k Dec 29, 2022
Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

1 Jan 23, 2022
一套完整的微博舆情分析流程代码,包括微博爬虫、LDA主题分析和情感分析。

已经将项目的关键文件上传,包含微博爬虫、LDA主题分析和情感分析三个部分。 1.微博爬虫 实现微博评论爬取和微博用户信息爬取,一天大概十万条。 2.LDA主题分析 实现文档主题抽取,包括数据清洗及分词、主题数的确定(主题一致性和困惑度)和最优主题模型的选择(暴力搜索)。 3.情感分析 实现评论文本的

182 Jan 02, 2023
Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

polyaxon 3.2k Dec 23, 2022
Technical Analysis library in pandas for backtesting algotrading and quantitative analysis

bta-lib - A pandas based Technical Analysis Library bta-lib is pandas based technical analysis library and part of the backtrader family. Links Main P

DRo 393 Dec 20, 2022
PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, CVPR 2019.

PointRCNN PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud Code release for the paper PointRCNN:3D Object Proposal Generation a

Shaoshuai Shi 1.5k Dec 27, 2022
Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

Tom Goldstein 2.2k Jan 09, 2023
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,

2 Sep 20, 2022
iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis Andreas Bl

CompVis Heidelberg 36 Dec 25, 2022