Page to PAGE Layout Analysis Tool

Last update: Nov 24, 2022

Overview

P2PaLA

Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks.

💥 Try our new DEMO for online baseline detection. ❗ ❗

If you find this toolkit useful in your research, please cite:

@misc{p2pala2017,
  author = {Lorenzo Quirós},
  title = {P2PaLA: Page to PAGE Layout Analysis tookit},
  year = {2017},
  publisher = {GitHub},
  note = {GitHub repository},
  howpublished = {\url{https://github.com/lquirosd/P2PaLA}},
}

Check this paper for more details Arxiv.

Requirements

Linux (OSX may work, but untested.).
Python (2.7, 3.6 under conda virtual environment is recomended)
Numpy
PyTorch (1.0). PyTorch 0.3.1 compatible on this branch
OpenCv (3.4.5.20).
NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN works, but is not recomended for training).
tensorboard-pytorch (v0.9) [Optional]. pip install tensorboardX > A diferent conda env is recomended to keep tensorflow separated from PyTorch

Install

python setup.py install

To install python dependencies alone, use requirements file conda env create --file conda_requirements.yml

Usage

Input data must follow the folder structure data_tag/page, where images must be into the data_tag folder and xml files into page. For example:

mkdir -p data/{train,val,test,prod}/page;
tree data;

data
├── prod
│   ├── page
│   │   ├── prod_0.xml
│   │   └── prod_1.xml
│   ├── prod_0.jpg
│   └── prod_1.jpg
├── test
│   ├── page
│   │   ├── test_0.xml
│   │   └── test_1.xml
│   ├── test_0.jpg
│   └── test_1.jpg
├── train
│   ├── page
│   │   ├── train_0.xml
│   │   └── train_1.xml
│   ├── train_0.jpg
│   └── train_1.jpg
└── val
    ├── page
    │   ├── val_0.xml
    │   └── val_1.xml
    ├── val_0.jpg
    └── val_1.jpg

Run the tool.

python P2PaLA.py --config config.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"

❗ Pre-trained models available here

Use TensorBoard to visualize train status:

tensorboard --logdir ./work/runs

xml-PAGE files must be at "./work/results/test/"

We recommend Transkribus or nw-page-editor to visualize and edit PAGE-xml files.

For detail about arguments and config file, see docs or python P2PaLa.py -h.
For more detailed example see egs:
- Bozen dataset see
- cBAD complex competition dataset see
- OHG dataset see

License

GNU General Public License v3.0 See LICENSE to see the full text.

Acknowledgments

Code is inspired by pix2pix and pytorch-CycleGAN-and-pix2pix

Page to PAGE Layout Analysis Tool

Related tags

Overview

P2PaLA

Requirements

Install

Usage

License

Acknowledgments

Owner

Lorenzo Quirós Díaz

Generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv

Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

An easy to use an (hopefully useful) captcha solution for pyTelegramBotAPI

Table recognition inside douments using neural networks

A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别

Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

A simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dropbox account at every 5 seconds

Fast style transfer

OCR-D-compliant page segmentation

A pkg stiching around view images(4-6cameras) to generate bird's eye view.

Program created with opencv that allows you to automatically count your repetitions on several fitness exercises.

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Autonomous Driving project for Euro Truck Simulator 2

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Ddddocr - 通用验证码识别OCR pypi版

Distort a video using Seam Carving (video) and Vibrato effect (sound)

基于图像识别的开源RPA工具，理论上可以支持所有windows软件和网页的自动化

A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.