Document Image Dewarping

Overview

Document image dewarping using text-lines and line Segments

Abstract

Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, we propose to use line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that all the transformed line segments are still straight (line to line mapping), and many of them are horizontally or vertically aligned in the well-rectified images, we encode this properties into the cost function in addition to the text-line based cost. By minimizing the function, we can obtain transformation parameters for camera pose, page curve (extrinsic parameters) and camera focal length (intrinsic parameter), which are used for document rectification. Considering that there are many outliers in line segment directions and missed text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, we remove text components and line segments that are not well horizontal/vertical aligned, and then minimize the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts. Moreover, the proposed method can extend to general curves surfaces as well as document.

Algorithm

Two line semgent properties

Straightness property

The straightness property describes the line segments extracted in curved document image, lines on the curved document surface become still straight in the well-rectified domain (Although the lines extracted in the well-rectified image can be curved in the curved document surface). It means that line-to-line mapping. Since the straightness property is always satisfied with all plane to plane mapping, it is not a significant constraint in rectification considering only camera view (such as homography). However we consider page curve as well as camera view in rectification process, then this property becomes an efficient constraint that prevents lines from being curved.

Alignment property

Based on the observation that the majority of line segments are horizontally or vertically aligned in the rectified images.

Outlier removal

The direct optimization of equation may yield poorly rectified results, due to outliers. We treat two outlier types that are missed text-lines and line segments having arbitrary direction (non horizontal/vertical). For the outlier removal, we design an iterative method. At each step, we refine the features (text components and line segments) by removing outlier (that are not well aligned) and minimize the cost function with updated inliers.

Experimental results

CBDAR 2007 dataset

We evaluate our method on the CBDAR 2007 dewarpint contest dataset [http://staffhome.ecm.uwa.edu.au/~00082689/downloads.html], that is consisted of binarized text images.

Input image Kim [2] Proposed

Our document image dataset

In order to consist of non conventional document images (i.e., not text-abundant cases), we collected 100 images having various layouts (e.g., three column documents, documents containing large tables and/or figures, presentation slides, and so on).

Input image Kim [2] Proposed

Our curved image dataset

In order to consist of general curved surface images (such as bottles), we collected 74 images.

Input image Kim [2] Proposed

Executable program

Executable program can be downloaded by below links:

http://ispl.synology.me:8480/sharing/uA2DTRA8U

Reference

[1] Taeho Kil, Wonkyo Seo, Hyung Il Koo and Nam Ik Cho, "Robust Document Image Dewarping Using Text-Line and Line Segments", ICDAR 2017.

[2] Beom Su Kim, Hyung Il Koo, and Nam Ik Cho, "Document Dewarping via Text-line based Optimization", Pattern Recognition 2015.

Owner
Taeho Kil
My Research: Visual-Linguistic Representation, Computer Vision, Image Processing, Deep Learning
Taeho Kil
Introduction to image processing, most used and popular functions of OpenCV

👀 OpenCV 101 Introduction to image processing, most used and popular functions of OpenCV go here.

Vusal Ismayilov 3 Jul 02, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
The virtual calculator will be above the live streaming from your camera

The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the

gasbaoui mohammed al amine 5 Jul 01, 2022
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

Pascal Fischer 178 Dec 27, 2022
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022
pulse2percept: A Python-based simulation framework for bionic vision

pulse2percept: A Python-based simulation framework for bionic vision Retinal degenerative diseases such as retinitis pigmentosa and macular degenerati

67 Dec 29, 2022
Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

CV-Virtual-WhiteBoard The Virtual WhiteBoard is a project I made using the OpenCV and Mediapipe Python libraries. Using your index and middle finger y

Stephen Wang 1 Jan 07, 2022
Solution for Problem 1 by team codesquad for AIDL 2020. Uses ML Kit for OCR and OpenCV for image processing

CodeSquad PS1 Solution for Problem Statement 1 for AIDL 2020 conducted by @unifynd technologies. Problem Given images of bills/invoices, the task was

Burhanuddin Udaipurwala 111 Nov 27, 2022
Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

Abdulazeez Jimoh 1 Jan 01, 2022
A simple document layout analysis using Python-OpenCV

Run the application: python main.py *Note: For first time running the application, create a folder named "output". The application is a simple documen

Roinand Aguila 109 Dec 12, 2022
A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well

ocrserver Simple OCR server, as a small working sample for gosseract. Try now here https://ocr-example.herokuapp.com/, and deploy your own now. Deploy

Hiromu OCHIAI 541 Dec 28, 2022
Run tesseract with the tesserocr bindings with @OCR-D's interfaces

ocrd_tesserocr Crop, deskew, segment into regions / tables / lines / words, or recognize with tesserocr Introduction This package offers OCR-D complia

OCR-D 38 Oct 14, 2022
Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

Matt Zucker 1.2k Dec 29, 2022
Hand gesture detection project with aweome UI implementation.

an awesome hand gesture detection project for you to be creative! Imagination is the limit to do with this project.

AR Ashraf 39 Sep 26, 2022
📷 Face Recognition using Haar-Cascade Classifier, OpenCV, and Python

Face-Recognition-System Face Recognition using Haar-Cascade Classifier, OpenCV and Python. This project is based on face detection and face recognitio

1 Jan 10, 2022
The open source extract transaction infomation by using OCR.

Transaction OCR Mã nguồn trích xuất thông tin transaction từ file scaned pdf, ở đây tôi lựa chọn tài liệu sao kê công khai của Thuy Tien. Mã nguồn có

Nguyen Xuan Hung 18 Jun 02, 2022
This project is basically to draw lines with your hand, using python, opencv, mediapipe.

Paint Opencv 📷 This project is basically to draw lines with your hand, using python, opencv, mediapipe. Screenshoots 📱 Tools ⚙️ Python Opencv Mediap

Williams Ismael Bobadilla Torres 3 Nov 17, 2021
Visual Attention based OCR

Attention-OCR Authours: Qi Guo and Yuntian Deng Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to hei

Yuntian Deng 1.1k Jan 02, 2023
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Dec 28, 2022