textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

Overview

An End-to-End TextSpotter with Explicit Alignment and Attention

This is initially described in our CVPR 2018 paper.

Getting Started

Installation

  • Clone the code
git clone https://github.com/tonghe90/textspotter
cd textspotter
  • Install caffe. You can follow this this tutorial. If you have build problem about std::allocater, please refer to this #3
# make sure you set WITH_PYTHON_LAYER := 1
# change Makefile.config according to your library path
cp Makefile.config.example Makefile.config
make clean
make -j8
make pycaffe

Training

we provide part of the training code. But you can not run this directly. 
We have give the comment in the [train.pt](https://github.com/tonghe90/textspotter/models/train.pt).
You have to write your own layer, IOUloss layer. We cannot publish this for some IP reason. 
To be noticed: 
[L6902](https://github.com/tonghe90/textspotter/models/train.pt#L6902) 
[L6947](https://github.com/tonghe90/textspotter/models/train.pt#L6907)

Testing

  • install editdistance and pyclipper: pip install editdistance and pip install pyclipper

  • After Caffe is set up, you need to download a trained model (about 40M) from Google Drive. This model is trained with VGG800k and finetuned on ICDAR2015.

  • Run python test.py --img=./imgs/img_105.jpg

  • hyperparameters:

cfg.py --mean_val ==> mean value during the testing.
       --max_len ==> maximum length of the text string (here we take 25, meaning a word can contain 25 characters at most.)
       --recog_th ==> the threshold during the recognition process. The score for a word is the average mean of every character.
       --word_score ==> the threshold for those words that contain number or symbols for they are not contained in the dictionary.

test.py --weight ==> weights file of caffemodel
        --prototxt-iou ==> the prototxt file for detection.
        --prototxt-lstm ==> the prototxt file for recognition.
        --img ==> the folder or img file for testing. The format can be added in ./pylayer/tool is_image function.
        --scales-ms ==> multiscales input for input during the testing process.
        --thresholds-ms ==> corresponding thresholds of text region for multiscale inputs.
        --nms ==> nms threshold for testing
        --save-dir ==> the dir for save results in format of ICDAR2015 submition.
One thing should be noted: the recognition results are achieved by comparing direct output with words in dictionary, which has about 90K lexicons. 
These lexicons don't contain any number and symbol. You can delete dictionary reference part and directly output recognition results.

Citation

If you use this code for your research, please cite our papers.

@inproceedings{tong2018,
  title={An End-to-End TextSpotter with Explicit Alignment and Attention},
  author={T. He and Z. Tian and W. Huang and C. Shen and Y. Qiao and C. Sun},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
  year={2018}
}

License

This code is for NON-COMMERCIAL purposes only. For commerical purposes, please contact Chunhua Shen [email protected]. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3. Please refer to http://www.gnu.org/licenses/ for more details.

Owner
Tong He
Tong He
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
An interactive document scanner built in Python using OpenCV

The scanner takes a poorly scanned image, finds the corners of the document, applies the perspective transformation to get a top-down view of the document, sharpens the image, and applies an adaptive

Kushal Shingote 1 Feb 12, 2022
An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss This is an unofficial implementation of AutoVC based on the official one. The reposi

Chien-yu Huang 27 Jun 16, 2022
Qrcode Attendence System with Opencv and Pyzbar

Setup process Creates a virtual environment (Scripts that ensure executed Python code uses the Python interpreter and site packages installed inside t

Ganesh 5 Aug 01, 2022
TextBoxes re-implement using tensorflow

TextBoxes-TensorFlow TextBoxes re-implementation using tensorflow. This project is greatly inspired by slim project And many functions are modified ba

Gu Xiaodong 44 Dec 29, 2022
A python program to block out your face

Readme This is a small program I threw together in about 6 hours to block out your face. It probably doesn't work very well, so be warned. By default,

1 Oct 17, 2021
Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Mathematical formulae extractor The goal of this project is to create a learning based system that takes an image of a math formula and returns corres

6 May 22, 2022
This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

Saurabh Daware 4.2k Jan 07, 2023
A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst

Harsh Vardhan Singh 1 Oct 29, 2021
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

Zach Wang 45 Dec 30, 2022
Controlling Volume by Hand Gestures

This program allows the user to control the volume of their device with specific hand gestures involving their thumb and index finger!

Riddhi Bajaj 1 Nov 11, 2021
Balabobapy - Using artificial intelligence algorithms to continue the text

Balabobapy - Using artificial intelligence algorithms to continue the text

qxtony 1 Feb 04, 2022
Rest API Written In Python To Classify NSFW Images.

✨ NSFW Classifier API ✨ Rest API Written In Python To Classify NSFW Images. Fastest Solution If you don't want to selfhost it, there's already an inst

Akshay Rajput 23 Dec 30, 2022
This repo contains several opencv projects done while learning opencv in python.

opencv-projects-python This repo contains both several opencv projects done while learning opencv by python and opencv learning resources [Basic conce

Fatin Shadab 2 Nov 03, 2022
一键翻译各类图片内文字

一键翻译各类图片内文字 针对群内、各个图站上大量不太可能会有人去翻译的图片设计,让我这种日语小白能够勉强看懂图片 主要支持日语,不过也能识别汉语和小写英文 支持简单的涂白和嵌字

574 Dec 28, 2022
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021
Repositório para registro de estudo da biblioteca opencv (Python)

OpenCV (Python) Objetivo do Repositório: Registrar avanços no estudo da biblioteca opencv. O repositório estará aberto a qualquer pessoa e há tambem u

1 Jun 14, 2022
Python rubik's cube solver

This program makes a 3D representation of a rubiks cube and solves it step by step.

Pablo QB 4 May 29, 2022
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

81 Dec 01, 2022