Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

Last update: Dec 21, 2022

Related tags

Deep Learning rotated-box-is-back

Overview

Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

This material is supplementray code for paper accepted in ICDAR 2021

We highly recommend to use docker image because our model contains custom operation which depends on framework and cuda version.
We provide trained model for ICDAR 2017, 2013 which is in final_checkpoint_ch8 and for ICDAR 2015 which is in final_checkpoint_ch4
This code is mainly focused on inference. To train our model, training gpu like V100 is needed. please check our paper in detail.

REQUIREMENT

Nvidia-docker
Tensorflow 1.14
Miminum GPU requirement : NVIDIA GTX 1080TI

INSTALLATION

Make docker image and container

docker build --tag rbimage ./dockerfile
docker run --runtime=nvidia --name rbcontainer -v /rotated-box-is-back-path:/rotated-box-is-back -i -t rbimage /bin/bash

build custom operations in container

cd /rotated-box-is-back/nms 
cmake ./
make
./shell.sh

SAMPLE IMAGE INFERENCE

cd /rotated-box-is-back/
python viz.py --test_data_path=./sample --checkpoint_path=./final_checkpoint_ch8 --output_dir=./sample_result  --thres 0.6 --min_size=1600 --max_size=2000

ICDAR 2017 INFERENCE

please replace icdar_testset_path to your-icdar-2017-testset-folder path.

python viz.py --test_data_path=icdar_testset_path --checkpoint_path=./final_checkpoint_ch8 --output_dir=./ic17  --thres 0.6 --min_size=1600 --max_size=2000

ICDAR 2015 INFERENCE

please replace icdar_testset_path to your-icdar-2015-testset-folder path.
To converting evalutation format. Convert result text file like below

python viz.py --test_data_path=icdar_testset_path --checkpoint_path=./final_checkpoint_ch4 --output_dir=./ic15  --thres 0.7 --min_size=1100 --max_size=2000
python text_postprocessing.py -i=./ic15/ -o=./ic15_format/ -e True

ICDAR 2013 INFERENCE

please replace icdar_testset_path to your-icdar-2013-testset-folder path.
To converting evalutation format. Convert result text file like below

python viz.py --test_data_path=icdar_testset_path --checkpoint_path=./final_checkpoint_ch8 --output_dir=./ic13  --thres 0.55 --min_size=700 --max_size=900
python text_postprocessing.py -i=./ic13/ -o=./ic13_format/ -e True -m rec

EVALUATION TABLE

IC13			IC15			IC17
P	R	F	P	R	F	P	R	F
95.9	89.1	92.4	89.7	84.2	86.9	83.4	68.2	75.0

TRAINING

It can be trained below command line

python train_refine_estimator.py --input_size=1024 --batch_size=2 --checkpoint_path=./finetuning --training_data_path=your-image-path --training_gt_path=your-gt-path  --learning_rate=0.00001 --max_epochs=500  --save_summary_steps=1000 --warmup_path=./final_checkpoint_ch8

ACKNOWLEDGEMENT

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 1711125972, Audio-Visual Perception for Autonomous Rescue Drones).

CITATION

If you found it is helpfull for your research, please cite:

Lee J., Lee J., Yang C., Lee Y., Lee J. (2021) Rotated Box Is Back: An Accurate Box Proposal Network for Scene Text Detection. In: Lladós J., Lopresti D., Uchida S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science, vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_4

Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

Related tags

Overview

Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

This material is supplementray code for paper accepted in ICDAR 2021

REQUIREMENT

INSTALLATION

SAMPLE IMAGE INFERENCE

ICDAR 2017 INFERENCE

ICDAR 2015 INFERENCE

ICDAR 2013 INFERENCE

EVALUATION TABLE

TRAINING

ACKNOWLEDGEMENT

CITATION

Owner

NCSOFT

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

Multiple style transfer via variational autoencoder

cl;asification problem using classification models in supervised learning

GAN JAX - A toy project to generate images from GANs with JAX

Multi Task Vision and Language

The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

Public repository created to store my custom-made tools for Just Dance (UbiArt Engine)

Style-based Neural Drum Synthesis with GAN inversion

PyTorch implementation of Constrained Policy Optimization

Implements pytorch code for the Accelerated SGD algorithm.

Implementation of our paper 'RESA: Recurrent Feature-Shift Aggregator for Lane Detection' in AAAI2021.

PyTorch and GPyTorch implementation of the paper "Conditioning Sparse Variational Gaussian Processes for Online Decision-making."

Robust and Accurate Object Detection via Self-Knowledge Distillation

A curated list of neural rendering resources.

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

Camera-caps - Examine the camera capabilities for V4l2 cameras

Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.

Code for unmixing audio signals in four different stems "drums, bass, vocals, others". The code is adapted from "Jukebox: A Generative Model for Music"

PyTorch implementation of SQN based on CloserLook3D's encoder