A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Last update: Dec 12, 2022

Related tags

Overview

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

A PyTorch implement of TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes (ECCV 2018) by Megvii

Paper link: arXiv:1807.01544
Github: princewang1994/TextSnake.pytorch
Blog: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Paper

Comparison of diﬀerent representations for text instances. (a) Axis-aligned rectangle. (b) Rotated rectangle. (c) Quadrangle. (d) TextSnake. Obviously, the proposed TextSnake representation is able to eﬀectively and precisely describe the geometric properties, such as location, scale, and bending of curved text with perspective distortion, while the other representations (axis-aligned rectangle, rotated rectangle or quadrangle) struggle with giving accurate predictions in such cases.

Textsnake elements:

center point
tangent line
text region

Description

Generally, this code has following features:

include complete training and inference code
pure python version without extra compiling
compatible with laste PyTorch version (write with pytroch 0.4.0)
support TotalText and SynthText dataset

Getting Started

This repo includes the training code and inference demo of TextSnake, training and infercence can be simplely run with a few code.

Prerequisites

To run this repo successfully, it is highly recommanded with:

Linux (Ubuntu 16.04)
Python3.6
Anaconda3
NVIDIA GPU(with 8G or larger GPU memory for training, 2G for inference)

(I haven't test it on other Python version.)

clone this repository

git clone https://github.com/princewang1994/TextSnake.pytorch.git

python package can be installed with pip

$ cd $TEXTSNAKE_ROOT
$ pip install -r requirements.txt

Data preparation

Total-Text: follow the dataset/total_text/README.md
SynthText: follow the datset/synth-text/README.md

Pretraining with SynthText

$ CUDA_VISIBLE_DEVICES=$GPUID python train.py synthtext_pretrain --dataset synth-text --viz --max_epoch 1 --batch_size 8

Training

Training model with given experiment name $EXPNAME

training from scratch:

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python train.py $EXPNAME --viz

training with pretrained model(improved performance much)

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python train.py example --viz --batch_size 8 --resume save/synthtext_pretrain/textsnake_vgg_0.pth

options:

exp_name: experiment name, used to identify different training processes
--viz: visualization toggle, output pictures are saved to ./vis by default

other options can be show by run python train.py -h

Running tests

Runing following command can generate demo on TotalText dataset (300 pictures), the result are save to ./vis by default

$ EXPNAME=example
$ CUDA_VISIBLE_DEVICES=$GPUID python eval_textsnake.py $EXPNAME --checkepoch 190

options:

exp_name: experiment name, used to identify different training process

other options can be show by run python train.py -h

Evaluation

Total-Text metric is included in dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py, you should first modify the input_dir in Deteval.py and run following command for computing DetEval:

$ python dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py $EXPNAME --tr 0.8 --tp 0.4

$ python dataset/total_text/Evaluation_Protocol/Python_scripts/Deteval.py $EXPNAME --tr 0.7 --tp 0.6

it will output metrics reports.

Pretrained Models

SynthText pretrained model: synthtext_fixlr/textsnake_vgg_0.pth (extract code: xmoh)
Total-Text pretrained model: finetune_larger_tcl/textsnake_vgg_180.pth (extract code: dms6)
Google Drive: TextSnake_pretrain

Download from links above and place pth file to the corresponding path(save/XXX/textsnake_vgg_XX.pth).

Performance

DetEval reporting

Following table reports DetEval metrics when we set vgg as the backbone(can be reproduced by using pertained model in Pretrained Model section):

	tr=0.7 / tp=0.6(P\|R\|F1)	tr=0.8 / tp=0.4(P\|R\|F1)	FPS(On single 1080Ti)
expand / no merge	0.652 \| 0.549 \| 0.596	0.874 \| 0.711 \| 0.784	12.07
expand / merge	0.698 \| 0.578 \| 0.633	0.859 \| 0.660 \| 0.746	8.38
no expand / no merge	0.753 \| 0.693 \| 0.722	0.695 \| 0.628 \| 0.660	9.94
no expand / merge	0.747 \| 0.677 \| 0.710	0.691 \| 0.602 \| 0.643	11.05
reported on paper	-	0.827 \| 0.745 \| 0.784

* expand denotes expanding radius by 0.3 times while post-processing

* merge denotes that merging overlapped instance while post-processing

Pure Inference

You can also run prediction on your own dataset without annotations:

Download pretrained model and place .pth file to save/pretrained/textsnake_vgg_180.pth
Run pure inference script as following:

$ EXPNAME=pretrained
$ CUDA_VISIBLE_DEVICES=$GPUID python demo.py $EXPNAME --checkepoch 180 --img_root /path/to/image

predicted result will be saved in output/$EXPNAME and visualization in vis/${EXPNAME}_deploy

Qualitative results

left: prediction/ground true
middle: text region(TR)
right: text center line(TCL)

What is comming

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgement

This project is writen by Prince Wang, part of codes refer to songdejia/EAST
Thanks techkang for your great help!

	tr=0.7 / tp=0.6(P\|R\|F1)	tr=0.8 / tp=0.4(P\|R\|F1)	FPS(On single 1080Ti)
expand / no merge	0.652 \| 0.549 \| 0.596	0.874 \| 0.711 \| 0.784	12.07
expand / merge	0.698 \| 0.578 \| 0.633	0.859 \| 0.660 \| 0.746	8.38
no expand / no merge	0.753 \| 0.693 \| 0.722	0.695 \| 0.628 \| 0.660	9.94
no expand / merge	0.747 \| 0.677 \| 0.710	0.691 \| 0.602 \| 0.643	11.05
reported on paper	-	0.827 \| 0.745 \| 0.784

A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Related tags

Overview

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Paper

Description

Getting Started

Prerequisites

Data preparation

Pretraining with SynthText

Training

Running tests

Evaluation

Pretrained Models

Performance

DetEval reporting

Pure Inference

Qualitative results

What is comming

License

Acknowledgement

Owner

Prince Wang

(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Simple app for visual editing of Page XML files

The world's simplest facial recognition api for Python and the command line

Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Msos searcher - A half-hearted attempt at finding a magic square of squares

Distort a video using Seam Carving (video) and Vibrato effect (sound)

Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

Awesome Spectral Indices in Python.

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

A webcam-based 3x3x3 rubik's cube solver written in Python 3 and OpenCV.

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

This repo contains several opencv projects done while learning opencv in python.

Play the Namibian game of Owela against a terrible AI. Built using Django and htmx.

A python programusing Tkinter graphics library to randomize questions and answers contained in text files

Image processing in Python

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約