A post-processing tool for scanned sheets of paper.

Last update: Dec 07, 2022

Related tags

Overview

unpaper

Originally written by Jens Gulden — see AUTHORS for more information. Licensed under GNU GPL v2 — see COPYING for more information.

Overview

unpaper is a post-processing tool for scanned sheets of paper, especially for book pages that have been scanned from previously created photocopies. The main purpose is to make scanned book pages better readable on screen after conversion to PDF. Additionally, unpaper might be useful to enhance the quality of scanned pages before performing optical character recognition (OCR).

unpaper tries to clean scanned images by removing dark edges that appeared through scanning or copying on areas outside the actual page content (e.g. dark areas between the left-hand-side and the right-hand-side of a double- sided book-page scan).

The program also tries to detect misaligned centering and rotation of pages and will automatically straighten each page by rotating it to the correct angle. This process is called "deskewing".

Note that the automatic processing will sometimes fail. It is always a good idea to manually control the results of unpaper and adjust the parameter settings according to the requirements of the input. Each processing step can also be disabled individually for each sheet.

See further documentation for the supported file formats notes.

Dependencies

The only hard dependency of unpaper is ffmpeg, which is used for file input and output.

Building instructions

unpaper uses GNU Autotools for its build system, so you should be able to execute the same commands used for other software packages:

./configure
make
sudo make install

There are, though, some recommendations about the way you build the code. Since the tasks are calculation-intensive, it is important to build with optimizations turned on:

./configure CFLAGS="-O2 -march-native -pipe"

Even better, if your compiler supports it, is to use Link-Time Optimizations, as that has shown that execution time can improve sensibly:

./configure CFLAGS="-O2 -march=native -pipe -flto"

Further optimizations such as -ftracer and -ftree-vectorize are thought to work, but their effect has not been evaluated so your mileage may vary.

Further Information

You can find more information on the basic concepts and the image processing in the available documentation.

A post-processing tool for scanned sheets of paper.

Related tags

Overview

unpaper

Overview

Dependencies

Building instructions

Further Information

Owner

OpenCV-Erlang/Elixir bindings

Application that instantly translates sign-language to letters.

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

Text language identification using Wikipedia data

The papers published in top-tier AI conferences in recent years.

Tool which allow you to detect and translate text.

Msos searcher - A half-hearted attempt at finding a magic square of squares

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

FastOCR is a desktop application for OCR API.

a micro OCR network with 0.07mb params.

Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

A machine learning software for extracting information from scholarly documents

Learning Camera Localization via Dense Scene Matching, CVPR2021

Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

Um simples projeto para fazer o reconhecimento do captcha usado pelo jogo bombcrypto

This is used to convert a string to an Image with Handwritten Characters.

Face Detection with DLIB

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.