CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

Last update: Dec 20, 2022

Overview

CUTIE

TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohui Zhao Paper Link

CUTIE 是用于“票据文档” 2D 关键信息提取/命名实体识别/槽位填充算法。使用CUTIE前，需先使用OCR算法对“票据文档” 中的文字执行检测和识别，而后将格式化的文本输入入CUTIE网络，具体流程可参照论文。

CUTIE can be considered as one type of 2-Dimensional Key Information Extraction, 2-D NER (Named Entity Recognition) or a 2-Dimensional 2D Slot Filling algorithm. Before training / inference with CUTIE, prepare your structured texts in your scanned document images with any type of OCR algorithm. Refer to the CUTIE paper for details about the procedure.

Results

Result evaluated on 4,484 receipt documents, including taxi receipts, meals entertainment receipts, and hotel receipts, with 9 different key information classes. (AP / softAP)

Method	#Params	Taxi	Hotel
CloudScan	-	82.0 / -	60.0 / -
BERT	110M	88.1 / -	71.7 / -
CUTIE	14M	94.0 / 97.3	74.6 / 87.0

Installation & Usage

pip install -r requirements.txt

Generate your own dictionary with main_build_dict.py / main_data_tokenizer.py
Train your model with main_train_json.py

CUTIE achieves best performance with rows/cols well configured. For more insights, refer to statistics in the file (others/TrainingStatistic.xlsx).

Others

For information about the input example, refer to issue discussion.

Apply any OCR tool that help you detecting and recognizing words in the scanned document image.
Label image OCR results with key information class as the .json file in the invoice_data folder. (thanks to @4kssoft)

CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

Related tags

Overview

CUTIE

Results

Installation & Usage

Others

Owner

Zhao,Xiaohui

A community-supported supercharged version of paperless: scan, index and archive all your physical documents

A tool to enhance your old/damaged pictures built using python & opencv.

pyntcloud is a Python library for working with 3D point clouds.

An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

A machine learning software for extracting information from scholarly documents

The Open Source Framework for Machine Vision

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

2 telegram-bots: for image recognition and for text generation

a micro OCR network with 0.07mb params.

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Python bindings for JIGSAW: a Delaunay-based unstructured mesh generator.

Lightning Fast Language Prediction 🚀

The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

Awesome Spectral Indices in Python.

Repository for playing the computer vision apps: People analytics on Raspberry Pi.

A little but useful tool to explore OCR data extracted with `pytesseract` and `opencv`

fishington.io bot with OpenCV and NumPy

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

Related tags

Overview

CUTIE

Results

Installation & Usage

Others

Owner

Zhao,Xiaohui

A community-supported supercharged version of paperless: scan, index and archive all your physical documents

A tool to enhance your old/damaged pictures built using python & opencv.

pyntcloud is a Python library for working with 3D point clouds.

An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

A machine learning software for extracting information from scholarly documents

The Open Source Framework for Machine Vision

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

2 telegram-bots: for image recognition and for text generation

a micro OCR network with 0.07mb params.

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Python bindings for JIGSAW: a Delaunay-based unstructured mesh generator.

Lightning Fast Language Prediction 🚀

The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

Awesome Spectral Indices in Python.

Repository for playing the computer vision apps: People analytics on Raspberry Pi.

A little but useful tool to explore OCR data extracted with `pytesseract` and `opencv`

fishington.io bot with OpenCV and NumPy

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約