Form Segmentation

Let's explore how we can extract text from any forms / scanned pages.

Objectives

The goal is to find an algorithm that can extract the maximum information from a given page (jpg format). So, we can feed it to another system. (Business logic, neural network, classifier, etc.) The overall process may not be perfect. But it would be great if it can find enough information to identify the type of document and the involve identities.

Parse any form / scanned page and extract any text data (printed text and handwriting text). So, no prior knowledge of the layout / structure of the document.
Automatic extraction process (no human interaction. So, it can scale out)
Somehow fast (or the ability to speed up the task with more machines or CPU)

Challenges

There are many challenges to overcome. But the main problem is to identify which part of the form contains text.

Some other challenges:

Black Border Removal
ICR (Intelligent Character Recognition): recognize and convert hand-drawn characters into text
Scanned page (Detect edges and apply a perspective transform to obtain the top-down view of the document)
Remove noise (blur, OTSU, adaptivethreshold with opencv)
Shape detection and extraction
OCR (Not a real issue since we can use : Tesseract 4 great for printed text)
Handwriting recognition
Minimize errors

Let's explore how we can extract text from forms

Related tags

Overview

Form Segmentation

Objectives

Challenges

Owner

Philip Doxakis

governance proposal to make fei redeemable for eth

Textboxes : Image Text Detection Model : python package (tensorflow)

Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

CNN+LSTM+CTC based OCR implemented using tensorflow.

graph learning code for ogb

Perspective recovery of text using transformed ellipses

Framework for the Complete Gaze Tracking Pipeline

Can We Find Neurons that Cause Unrealistic Images in Deep Generative Networks?

ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

This is a implementation of CRAFT OCR method

[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Distort a video using Seam Carving (video) and Vibrato effect (sound)

Generic framework for historical document processing

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

Resizing Canny Countour In Python

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Detect and fix skew in images containing text

Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Kornia is a open source differentiable computer vision library for PyTorch.

第一届西安交通大学人工智能实践大赛（2018AI实践大赛--图片文字识别）第一名；仅采用densenet识别图中文字