Some bits of javascript to transcribe scanned pages using PageXML

Overview

nashi (nasḫī)

Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's more: download now and get a complete webapp written in Python/Flask that handles import and export of your scanned pages to and from LAREX for semi-automatic layout analysis, does the line segmentation for you (via kraken) and saves your precious PageXML in a database. All you've got to do is follow the instructions below and help me implement all the missing features... OCR training and recognition is currently not included because of our webhost's limited capacity.

Instructions for nashi.html

  • Put nashi.html in a folder with (or some folder above) your PageXML files (containing line segmentation data) and the page images. Serve the folder in a webserver of your choice or simply use the file:// protocol (only supported in Firefox at the moment).
  • In the browser, open the interface as .../path/to/nashi.html?pagexml=Test.xml&direction=rtl where Test.xml (or subfolder/Test.xml) is one of the PageXML files and rtl (or ltr) indicates the main direction of your text.
  • Install the "Andron Scriptor Web" font to use the additional range of characters.

The interface

  • Lines without existing text are marked red, lines containing OCR data blue and lines already transcribed are coloured green.

Keyboard shortcuts in the text input area

  • Tab/Shift+Tab switches to the next/previous input.
  • Shift+Enter saves the edits for the current line.
  • Shift+Insert shows an additional range of characters to select as an alternative to the character next to the cursor. Input one of them using the corresponding number while holding Insert.
  • Shift+ArrowDown opens a new comment field (Shift+ArrowUp switches back to the transcription line).

Global keyboard shortcuts

  • Ctrl+Space Zooms in to line width
  • Ctrl+Shift+Space toggles zoom mode (always zoom in to line width)
  • Shift+PageUp/PageDown loads the next/previous page if the filenames of your PageXML files contain the number.
  • Ctrl+Shift+ArrowLeft/ArrowRight changes orientation and input direction to ltr/rtl.
  • Ctrl+S downloads the PageXML file.
  • Ctrl+E enters or exits polygon edit mode.

Edit mode

  • Click on line area to activate point handles. Points can be moved around using, new points can be created by drawing the borders between existing points.
  • If points or lines are active, they can be deleted using the "Delete"-key.
  • Hold Shift-key and draw to select multiple points
  • New text lines can be created by clicking inside an existing text region and drawing a rectangle. New lines are always added at the end of the region.

Instructions for the server

  • Install redis. The app uses celery as a task queue for line segmentation jobs (and probably OCR jobs in the future).
  • Install LAREX for semi-automatic layout analysis.
  • Install the server from this repository or from pypi:
pip install nashi
  • Create a config.py file. For more options see the file default_settings.py. If you want the app to send emails to users, change the mail settings there. Here is just a minimal example:
BOOKS_DIR = "/home/username/books/"
LAREX_DIR = "/home/username/larex_books/"
  • Set an environment variable containing your database url. If you don't, nashi will create a sqlite database called "test.db" in your working directory.
export DATABASE_URL="mysql+pymysql://user:[email protected]/mydb?charset=utf8"
  • Create the database tables (and users, if needed) from a python prompt. Login is disabled in the default config file.
from nashi import user_datastore
from nashi.database import db_session, init_db
init_db()
user_datastore.create_user(email="[email protected]", password="secret")
db_session.commit()
  • Run the celery worker:
export NASHI_SETTINGS=/home/user/path/to/config.py
celery -A nashi.celery worker --loglevel=info
  • Run the app, don't forget to export your DATABASE_URl again if you're using a new terminal:
export FLASK_APP=nashi
export NASHI_SETTINGS=/home/user/path/to/config.py
flask run
  • Open localhost:5000, log in, update your books list via "Edit, Refresh".

Planned features

  • Sorting of lines
  • Reading order
  • Creation and correction of regions
  • API for external OCR service
  • Advanced text editing capabilities
  • Help, examples, and documentation
  • Artificial general intelligence that writes the code for me
Owner
Andreas Büttner
Andreas Büttner
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr-fileformat Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) Installation Docker System-wide Usage CLI GUI API Transf

Universitätsbibliothek Mannheim 152 Dec 20, 2022
Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Opencv-image-filters - A camera to capture videos in real time by placing filters using Python with the help of the Tkinter and OpenCV libraries

Sergio Díaz Fernández 1 Jan 13, 2022
A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

Home-Security-Demo A facial recognition program that plays a alarm (mp3 file) when a person is seen in the room. A basic theif using Python and OpenCV

SysKey 4 Nov 02, 2021
Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

Overview This collection demonstrates how to construct and train a deep, bidirectional stacked LSTM using CNN features as input with CTC loss to perfo

Jerod Weinman 489 Dec 21, 2022
A synthetic data generator for text recognition

TextRecognitionDataGenerator A synthetic data generator for text recognition What is it for? Generating text image samples to train an OCR software. N

Edouard Belval 2.5k Jan 04, 2023
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
2 telegram-bots: for image recognition and for text generation

💻 📱 Telegram_Bots 🔎 & 📖 2 telegram-bots: for image recognition and for text generation. About Image recognition bot: User sends a photo and bot de

Marina Polukoshko 1 Jan 27, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 04, 2023
The virtual calculator will be above the live streaming from your camera

The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the

gasbaoui mohammed al amine 5 Jul 01, 2022
This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

Vectorizing color range This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vect

Development Seed 9 Jul 27, 2022
利用Paddle框架复现CRAFT

CRAFT-Paddle 利用Paddle框架复现CRAFT CRAFT 本项目基于paddlepaddle框架复现CRAFT,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: CRAFT: Character-Region Awarenes

QuanHao Guo 2 Mar 07, 2022
A python screen recorder for low-end computers, provides high quality video output.

RecorderX - v1.0 A screen recorder made in Python with the help of OpenCv, it has ability to record your screen in high quality. No matter what your P

Priyanshu Jindal 4 Nov 10, 2021
Toolbox for OCR post-correction

Ochre Ochre is a toolbox for OCR post-correction. Please note that this software is experimental and very much a work in progress! Overview of OCR pos

National Library of the Netherlands / Research 117 Nov 10, 2022
OCR powered screen-capture tool to capture information instead of images

NormCap OCR powered screen-capture tool to capture information instead of images. Links: Repo | PyPi | Releases | Changelog | FAQs Content: Quickstart

575 Dec 31, 2022
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

428 Nov 22, 2022
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

Hzzone 120 Jan 05, 2023
Driver Drowsiness Detection with OpenCV & Dlib

In this project, we have built a driver drowsiness detection system that will detect if the eyes of the driver are close for too long and infer if the driver is sleepy or inactive.

Mansi Mishra 4 Oct 26, 2022
Fatigue Driving Detection Based on Dlib

Fatigue Driving Detection Based on Dlib

5 Dec 14, 2022
Some codes from PyImageSearch course's and external projects.

👨‍💻 Some codes and projects 👨‍💻 💡 Technologies 📜 Projects 📍 Chrome Dinosaur Controller 📦 Script 📍 Coins Counter 📦 Script 🤓 Author Lucas Biv

Lucas Bivar 25 Oct 24, 2021
Implementation of EAST scene text detector in Keras

EAST: An Efficient and Accurate Scene Text Detector This is a Keras implementation of EAST based on a Tensorflow implementation made by argman. The or

Jan Zdenek 208 Nov 15, 2022