Handwritten Text Recognition (HTR) using TensorFlow 2.x

Overview

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR datasets. This Neural Network model recognizes the text contained in the images of segmented texts lines.

Data partitioning (train, validation, test) was performed following the methodology of each dataset. The project implemented the HTRModel abstraction model (inspired by CTCModel) as a way to facilitate the development of HTR systems.

Notes:

  1. All references are commented in the code.
  2. This project doesn't offer post-processing, such as Statistical Language Model.
  3. Check out the presentation in the doc folder.
  4. For more information and demo run step by step, check out the tutorial on Google Colab/Drive.

Datasets supported

a. Bentham

b. IAM

c. Rimes

d. Saint Gall

e. Washington

Requirements

  • Python 3.x
  • OpenCV 4.x
  • editdistance
  • TensorFlow 2.x

Command line arguments

  • --source: dataset/model name (bentham, iam, rimes, saintgall, washington)
  • --arch: network to be used (puigcerver, bluche, flor)
  • --transform: transform dataset to the HDF5 file
  • --cv2: visualize sample from transformed dataset
  • --kaldi_assets: save all assets for use with kaldi
  • --image: predict a single image with the source parameter
  • --train: train model using the source argument
  • --test: evaluate and predict model using the source argument
  • --norm_accentuation: discard accentuation marks in the evaluation
  • --norm_punctuation: discard punctuation marks in the evaluation
  • --epochs: number of epochs
  • --batch_size: number of the size of each batch

Tutorial (Google Colab/Drive)

A Jupyter Notebook is available to demo run, check out the tutorial on Google Colab/Drive.

Sample

Bentham sample with default parameters in the tutorial file.

  1. Preprocessed image (network input)
  2. TE_L: Ground Truth Text (label)
  3. TE_P: Predicted text (network output)

Citation

If this project helped in any way in your research work, feel free to cite the following papers.

HTR-Flor++: A Handwritten Text Recognition System Based on a Pipeline of Optical and Language Models (here)

This work aimed to propose a different pipeline for Handwritten Text Recognition (HTR) systems in post-processing, using two steps to correct the output text. The first step aimed to correct the text at the character level (using N-gram model). The second step had the objective of correcting the text at the word level (using a word frequency dictionary). The experiment was validated in the IAM dataset and compared to the best works proposed within this data scenario.

@inproceedings{10.1145/3395027.3419603,
    author      = {Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. and Lima, Estanislau B.},
    title       = {{HTR-Flor++:} A Handwritten Text Recognition System Based on a Pipeline of Optical and Language Models},
    booktitle   = {Proceedings of the ACM Symposium on Document Engineering 2020},
    year        = {2020},
    publisher   = {Association for Computing Machinery},
    address     = {New York, NY, USA},
    location    = {Virtual Event, CA, USA},
    series      = {DocEng '20},
    isbn        = {9781450380003},
    url         = {https://doi.org/10.1145/3395027.3419603},
    doi         = {10.1145/3395027.3419603},
}

Towards the Natural Language Processing as Spelling Correction for Offline Handwritten Text Recognition Systems (here)

This work aimed a deep study within the research field of Natural Language Processing (NLP), and to bring its approaches to the research field of Handwritten Text Recognition (HTR). Thus, for the experiment and validation, we used 5 datasets (Bentham, IAM, RIMES, Saint Gall and Washington), 3 optical models (Bluche, Puigcerver, Flor), and 8 techniques for text correction in post-processing, including approaches statistics and neural networks, such as encoder-decoder models (seq2seq and Transformers).

@article{10.3390/app10217711,
    author  = {Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H.},
    title   = {Towards the Natural Language Processing as Spelling Correction for Offline Handwritten Text Recognition Systems},
    journal = {Applied Sciences},
    pages   = {1-29},
    month   = {10},
    year    = {2020},
    volume  = {10},
    number  = {21},
    url     = {https://doi.org/10.3390/app10217711},
    doi     = {10.3390/app10217711},
}

HDSR-Flor: A Robust End-to-End System to Solve the Handwritten Digit String Recognition Problem in Real Complex Scenarios (here)

This work aimed to propose the optical model for Handwritten Digit String Recognition (HDSR) and compare it with the state-of-the-art models. The International Conference on Frontiers of Handwriting Recognition (ICFHR) 2014 competition on HDSR were used as baselines toevaluate the effectiveness of our proposal, whose metrics, datasets and recognition methods were adopted for fair comparison. Furthermore, we also use a private dataset (Brazilian Bank Check - Courtesy Amount Recognition), and 11 different approaches from the state-of-the-art in HDSR, as well as 2 optical models from the state-of-the-art in Handwritten Text Recognition (HTR).

@article{10.1109/ACCESS.2020.3039003,
    author  = {Neto, Arthur F. S. and Bezerra, Byron L. D. and Lima, Estanislau B. and Toselli, Alejandro H.},
    title   = {{HDSR-Flor:} A Robust End-to-End System to Solve the Handwritten Digit String Recognition Problem in Real Complex Scenarios},
    journal = {IEEE Access},
    pages   = {208543-208553},
    month   = {11},
    year    = {2020},
    volume  = {8},
    isbn    = {2169-3536},
    url     = {https://doi.org/10.1109/ACCESS.2020.3039003},
    doi     = {10.1109/ACCESS.2020.3039003},
}

HTR-Flor: A Deep Learning System for Offline Handwritten Text Recognition (here)

This work aimed to propose the optical model for Handwritten Text Recognition (HTR) and compare it with the state-of-the-art models. The performance comparison was validated in 5 different datasets (Bentham, IAM, RIMES, Saint Gall and Washington). In addition, it was considered one of the best papers in the 33rd SIBGRAPI (2020).

@inproceedings{10.1109/SIBGRAPI51738.2020.00016,
    author      = {Neto, Arthur F. S. and Bezerra, Byron L. D. and Toselli, Alejandro H. and Lima, Estanislau B.},
    title       = {{HTR-Flor:} A Deep Learning System for Offline Handwritten Text Recognition},
    booktitle   = {2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)},
    pages       = {54-61},
    month       = {11},
    year        = {2020},
    location    = {Recife/Porto de Galinhas, PE, Brazil},
    series      = {SIBGRAPI' 33},
    publisher   = {IEEE Computer Society},
    address     = {Los Alamitos, CA, USA},
    url         = {https://doi.org/10.1109/SIBGRAPI51738.2020.00016},
    doi         = {10.1109/SIBGRAPI51738.2020.00016},
}
You might also like...
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

Text recognition (optical character recognition) with deep learning methods.
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

CUTIE TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohu

Comments
  • Bump tensorflow from 2.9.1 to 2.9.3

    Bump tensorflow from 2.9.1 to 2.9.3

    Bumps tensorflow from 2.9.1 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump tensorflow from 2.3.0 to 2.3.1

    Bump tensorflow from 2.3.0 to 2.3.1

    Bumps tensorflow from 2.3.0 to 2.3.1.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.3.1

    Release 2.3.1

    Bug Fixes and Other Changes

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.3.1

    Bug Fixes and Other Changes

    Release 2.2.1

    ... (truncated)

    Commits
    • fcc4b96 Merge pull request #43446 from tensorflow-jenkins/version-numbers-2.3.1-16251
    • 4cf2230 Update version numbers to 2.3.1
    • eee8224 Merge pull request #43441 from tensorflow-jenkins/relnotes-2.3.1-24672
    • 0d41b1d Update RELEASE.md
    • d99bd63 Insert release notes place-fill
    • d71d3ce Merge pull request #43414 from tensorflow/mihaimaruseac-patch-1-1
    • 9c91596 Fix missing import
    • f9f12f6 Merge pull request #43391 from tensorflow/mihaimaruseac-patch-4
    • 3ed271b Solve leftover from merge conflict
    • 9cf3773 Merge pull request #43358 from tensorflow/mm-patch-r2.3
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(v0.0.6)
An organized collection of tutorials and projects created for aspriring computer vision students.

A repository created with the purpose of teaching students in BME lab 308A- Hanoi University of Science and Technology

Givralnguyen 5 Nov 24, 2021
Using computer vision method to recognize and calcutate the features of the architecture.

building-feature-recognition In this repository, we accomplished building feature recognition using traditional/dl-assisted computer vision method. Th

4 Aug 11, 2022
Toolbox for OCR post-correction

Ochre Ochre is a toolbox for OCR post-correction. Please note that this software is experimental and very much a work in progress! Overview of OCR pos

National Library of the Netherlands / Research 117 Nov 10, 2022
Handwritten Text Recognition (HTR) using TensorFlow 2.x

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR data

Arthur Flôr 160 Dec 21, 2022
A simple QR-Code Reader in Python

A simple QR-Code Reader written in Python, that copies the content of a QR-Code directly into the copy clipboard.

Eric 1 Oct 28, 2021
pulse2percept: A Python-based simulation framework for bionic vision

pulse2percept: A Python-based simulation framework for bionic vision Retinal degenerative diseases such as retinitis pigmentosa and macular degenerati

67 Dec 29, 2022
Binarize document images

Binarization Binarization for document images Examples Introduction This tool performs document image binarization (i.e. transform colour/grayscale to

QURATOR-SPK 48 Jan 02, 2023
This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks (CVPR 2021 Oral) This repository contains the official PyTorch implementation

Shunsuke Saito 235 Dec 18, 2022
Image augmentation for machine learning experiments.

imgaug This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much lar

Alexander Jung 13.2k Jan 02, 2023
Maze generator and solver with python

Procedural-Maze-Generator-Algorithms Check out my youtube channel : Auctux Ressources Thanks to Jamis Buck Book : Mazes for programmers Requirements P

Joseph 19 Dec 07, 2022
This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe libraries.

CVZone This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe librar

CVZone 648 Dec 30, 2022
question‘s area recognition using image processing and regular expression

======================================== Paper-Question-recognition ======================================== question‘s area recognition using image p

Yuta Mizuki 7 Dec 27, 2021
PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application in real time.

PyNeuro PyNeuro is designed to connect NeuroSky's MindWave EEG device to Python and provide Callback functionality to provide data to your application

Zach Wang 45 Dec 30, 2022
Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract Toolset U^2-Net is used for background removal Textcleaner is used for image cleaning

3 Jul 13, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

2.4k Jan 08, 2023
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022
Msos searcher - A half-hearted attempt at finding a magic square of squares

MSOS searcher A half-hearted attempt at finding (or rather searching) a MSOS (Magic Square of Squares) in the spirit of the Parker Square. Running I r

Niels Mündler 1 Jan 02, 2022
Framework for the Complete Gaze Tracking Pipeline

Framework for the Complete Gaze Tracking Pipeline The figure below shows a general representation of the camera-to-screen gaze tracking pipeline [1].

Pascal 20 Jan 06, 2023
Text layer for bio-image annotation.

napari-text-layer Napari text layer for bio-image annotation. Installation You can install using pip: pip install napari-text-layer Keybindings and m

6 Sep 29, 2022