About Library for extract infomation from thai personal identity card.

Overview

ThaiPersonalCardExtract

Downloads PyPI Status license Instragram

Library for extract infomation from thai personal identity card. imprement from easyocr and tesseract

New Feature v1.3.2 🎁

  • Increase performance.
  • Support Thai Government Lottery สกัดข้อมูลจากลอตเตอร์รี่ ใช้ได้ดีกับรูปภาพที่ได้จากเครื่องแสกน (16 Aug. 2021)
  • Refactor Output Structure.
  • Support Thai Driving License (Beta) สามารถสกัดข้อมูลจากภาพถ่ายใบขับขี่ได้บางรูปแบบ เนื่องจาก กรมทางขนส่งทางบก มีรูปแบบบัตรหลากหลายรูปแบบ และแต่ละรูปแบบมีตำแหน่งข้อมูลที่แตกต่างกัน จึงทำให้ประสิทธิภาพต่ำ

Examples

Example image file.

Real image file Real image file Real image file

wrapPerpective image crop.

wrapPerpective image crop wrapPerpective image crop

keypoint of image detected.

keypoint of image detected

Resutls of library extract region of interest

Identification Number

FullNameTH

NameEN

LastNameEN

BirthdayTH

BirthdayEN

Religion

Address

DateOfIssueTH

DateOfIssueEN

DateOfExpiryTH

DateOfExpiryEN

Recommend

  • Image quality lowest should be 600x350
  • Images with minimal reflections should be used. for good results
  • Identity Card should be size in the image about 75%, if the image doesn't cropped that to be left only Identity Card area.
  • For faster, please resize image and usage CUDA GPU.

Installation

Install using pip for stable release,

pip install thai-personal-card-extract

For latest development release,

pip install git+git://github.com/ggafiled/ThaiPersonalCardExtrac.git

Note 1: for Windows, please install tesseract first by following the official instruction here https://medium.com/@navapat.tpb/734dae2fb4d3 On medium website, be sure to setup already.

Note 2: for Linux os, please install tesseract by following the official instruction https://github.com/tesseract-ocr/tesseract

Usage

# With build-in Config Options. 

import ThaiPersonalCardExtract as card
reader = card.PersonalCard(
    lang=card.THAI,
    provider=card.DEFAULT,
    tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract",
    save_extract_result=True,
    path_to_save="D:/dev/ThaiPersonalCardExtract/examples/extract")
result = reader.extractInfo('examples/card.jpg')
print(result)


# With free-style ตัวอย่างการเรียกใช้งานคลาส PersonalCard เพื่อสกัดข้อมูลบัตรประจำตัวประชาชน 

from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="mix", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)


# With free-style ตัวอย่างการเรียกใช้งานคลาส DrivingLicense เพื่อสกัดข้อมูลใบอนุญาตขับขี่

from ThaiPersonalCardExtract import DrivingLicense
reader = DrivingLicense(lang="mix", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)


# With free-style ตัวอย่างการเรียกใช้งานคลาส ThaiGovernmentLottery เพื่อสกัดข้อมูลลอตเตอร์รี่

from ThaiPersonalCardExtract import ThaiGovernmentLottery
reader = ThaiGovernmentLottery(save_extract_result=True, path_to_save="D:/dev/ThaiPersonalCardExtract/examples/extract/thai_government_lottery") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo("../examples/card7.jpg")
print(result)

Output will be in list format, each item represents result of library can extract, respectively. type of namedtuple ผลลัพธ์ที่ได้จะเป็นประเภท namedtuple สามารถศึกษาเพิ่มเติมเพื่อใช้งานได้จากที่นี่ คลิก

#Output of PersonalCard
    Card(Identification_Number='9999999999999', FullNameTH='นาย อายุมฺมุราเสะ', PrefixTH='นาย', NameTH='อายุมฺมุราเสะ', LastNameTH='อายุมฺมุราเสะ', PrefixEN='.Mr.Shoyo', NameEN='', LastNameEN='Hinatao', BirthdayTH='21 มี.ย. 2539', BirthdayEN='21 Jun..1996', Religion='พุทธ', Address='ท8ปฺ` 99/1 มิซีโฮะ เขตฮานามิกาวา อำเภอชิบ', DateOfIssueTH='11 ส.ค. 2554', DateOfIssueEN='11 Ang. 2021', DateOfExpiryTH='11 ส.ค. 2574', DateOfExpiryEN='11 Aug. 2031,')

#Output of DrivingLicense
    Card(License_Number='98765432', IssueDateTH='ผังทาทม', ExpiryDateTH='', IssueDateEN='14 August 2664', ExpiryDateEN='14 August 2574', NameTH='า? โนบกะ โนบี', NameEN='MRONOREAUMANE', BirthDayTH='', BirthDayEN='wa hs OKRA', Identity_Number='', Province='นคาราชศีมา')

#Output of ThaiGovernmentLottery
    Lottery(LotteryNumber='424603', LessonNumber='08', SetNumber='23', Year='2564') #type namedtuple 
    
 สามารถเข้าถึงตัวแปรได้ตามรูปแบบนี้
 print(result.LotteryNumber)
 print(result.LessonNumber)
 print(result.SetNumber)
 print(result.Year)

For set lang attribute to tha

from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="tha", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)

Output will be in list format, each item represents result of library can extract, respectively.

{
   "Identification_Number": "9999999999999",
   "FullNameTH": "นาย อายุมฺมุราเสะ",
   "PrefixTH": "นาย",
   "NameTH": "อายุมฺมุราเสะ",
   "LastNameTH": "อายุมฺมุราเสะ",
   "BirthdayTH": "21 มี.ย. 2539",
   "Religion": "พุทธ",
   "Address": "ท๒ 99/1 มิชีโฮะ เขตฮานามิกาวา อำเภอชิบ;",
   "DateOfIssueTH": "11 ส.ค. 2554",
   "DateOfExpiryTH": "11 ส.ค. 2574"
}

And you can set ocr provider following below default #used both easyocr and tesseract **Recommend Or easyocr Or tesseract

from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="tha", provider="default", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)

Config Options

you can set options to Instance by below keyword

Parameter name Value Type Example
lang String Expected Results Language bash mix #get all area both tha and eng Or bash tha Or bash eng *Default is 'mix' สำหรับ DrivingLicense, PersonalCard
provider String OCR Provider have bash default #used both easyocr and tesseract **Recommend Or bash easyocr Or bash tesseract *Default is 'default' สำหรับ DrivingLicense, PersonalCard
template_threshold Double Rate to cals similarity of template *Default is 0.7
sift_rate Int Feature Keypoint rate *Default is 25,000
tesseract_cmd String Path of your tesseract command **For windows only.
save_extract_result Boolean Set True if you want to save extracted image *Default is False
path_to_save String Path that you given it save extracted image, relative with save_extract_result=True

Donate Me

promptpay

Mr.Nattapol Krobklang

You might also like...
extract gene TSS/TES site form gencode/ensembl/gencode database GTF file and export bed format file.

GetTsite python Package extract gene TSS/TES site form gencode/ensembl/gencode database GTF file and export bed format file. Install $ pip install Get

A functional standard library for Python.

Toolz A set of utility functions for iterators, functions, and dictionaries. See the PyToolz documentation at https://toolz.readthedocs.io LICENSE New

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

Retrying library for Python

Tenacity Tenacity is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

Retrying Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

isort is a Python utility / library to sort imports alphabetically, and automatically separated into sections and by type.
isort is a Python utility / library to sort imports alphabetically, and automatically separated into sections and by type.

isort is a Python utility / library to sort imports alphabetically, and automatically separated into sections and by type. It provides a command line utility, Python library and plugins for various editors to quickly sort all your imports.

A Python library for reading, writing and visualizing the OMEGA Format
A Python library for reading, writing and visualizing the OMEGA Format

A Python library for reading, writing and visualizing the OMEGA Format, targeted towards storing reference and perception data in the automotive context on an object list basis with a focus on an urban use case.

RapidFuzz is a fast string matching library for Python and C++

RapidFuzz is a fast string matching library for Python and C++, which is using the string similarity calculations from FuzzyWuzzy

pydsinternals - A Python native library containing necessary classes, functions and structures to interact with Windows Active Directory.
pydsinternals - A Python native library containing necessary classes, functions and structures to interact with Windows Active Directory.

pydsinternals - Directory Services Internals Library A Python native library containing necessary classes, functions and structures to interact with W

Comments
  • 'PersonalCard' object has no attribute 'extractInfo'

    'PersonalCard' object has no attribute 'extractInfo'


    AttributeError Traceback (most recent call last) /var/folders/cc/d15mjmhn5c5fscqfwq3fql5h0000gp/T/ipykernel_6920/1182933257.py in ----> 1 result = reader.extractInfo('ThaiPersonalCardExtract/examples/extract/image_scan.jpg') 2 print(result)

    AttributeError: 'PersonalCard' object has no attribute 'extractInfo'

    opened by suwika 0
Releases(v1.3.4)
  • v1.3.4(Sep 2, 2021)

    New Feature v1.3.4 🎁

    • Support Thai identity card laser code extract. (02 Sep. 2021)
    • Fix bug dataset folder not import thai_government_lottery resource. (23 Aug. 2021) #1
    • Increase performance.
    • Support Thai Government Lottery สกัดข้อมูลจากลอตเตอร์รี่ ใช้ได้ดีกับรูปภาพที่ได้จากเครื่องแสกน (16 Aug. 2021)
    • Refactor Output Structure.
    • Support Thai Driving License (Beta) สามารถสกัดข้อมูลจากภาพถ่ายใบขับขี่ได้บางรูปแบบ เนื่องจาก กรมทางขนส่งทางบก มีรูปแบบบัตรหลากหลายรูปแบบ และแต่ละรูปแบบมีตำแหน่งข้อมูลที่แตกต่างกัน จึงทำให้ประสิทธิภาพต่ำ
    Source code(tar.gz)
    Source code(zip)
  • v1.3.1(Aug 16, 2021)

    New Feature v1.3.1 🎁 Increase performance. Support Thai Driving License (Beta) สามารถสกัดข้อมูลจากภาพถ่ายใบขับขี่ได้บางรูปแบบ เนื่องจาก กรมทางขนส่งทางบก มีรูปแบบบัตรหลากหลายรูปแบบ และแต่ละรูปแบบมีตำแหน่งข้อมูลที่แตกต่างกัน จึงทำให้ประสิทธิภาพต่ำ Support Thai Government Lottery (16 Aug. 2021)

    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Aug 14, 2021)

    Increase performance. Support Thai Driving License (Beta) สามารถสกัดข้อมูลจากภาพถ่ายใบขับขี่ได้บางรูปแบบ เนื่องจาก กรมทางขนส่งทางบก มีรูปแบบบัตรหลากหลายรูปแบบ และแต่ละรูปแบบมีตำแหน่งข้อมูลที่แตกต่างกัน จึงทำให้ประสิทธิภาพต่ำ ปรับเปลี่ยนรูปแบบไฟล์ระบบ

    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Aug 13, 2021)

    New Feature 🎁

    • More arae extract.
    • lang : attribute : get only area of given language.
    • provider : attribute : set ocr provider now support easyocr and tesseract.
    Source code(tar.gz)
    Source code(zip)
  • v1.0-beta(Aug 11, 2021)

Owner
ggafiled
นานๆที อัพ ครับบบ.
ggafiled
Script for generating Hearthstone card spoilers & checklists

This is a script for generating text spoilers and set checklists for Hearthstone. Installation & Running Python 3.6 or higher is required. Copy/clone

John T. Wodder II 1 Oct 11, 2022
osqueryIR is an artifact collection tool for Linux systems.

osqueryIR osqueryIR is an artifact collection tool for Linux systems. It provides the following capabilities: Execute osquery SQL queries Collect file

AbdulRhman Alfaifi 7 Nov 02, 2022
一款不需要买代理来减少扫网站目录被封概率的扫描器,适用于中小规格字典。

PoorScanner使用说明书 -工具在不同环境下可能不怎么稳定,如果有什么问题恳请大家反馈。说明书有什么错误的地方也大家欢迎指正。 更新记录 2021.8.23 修复了云函数主程序 gitee上传文件接口写错了的BUG(之前把自己的上传地址写死进去了,没从配置文件里读) 更新了说明书 PoorS

14 Aug 02, 2022
iOS Snapchat parser for chats and cached files

ParseSnapchat iOS Snapchat parser for chats and cached files Tested on Windows and Linux install required libraries: pip install -r requirements.txt c

11 Dec 05, 2022
This code renames subtitle file names to your video files names, so you don't need to rename them manually.

Rename Subtitle This code renames your subtitle file names to your video file names so you don't need to do it manually Note: It only works for series

Mostafa Kazemi 4 Sep 12, 2021
Handy Tool to check the availability of onion site and to extract the title of submitted onion links.

This tool helps is to quickly investigate a huge set of onion sites based by checking its availability which helps to filter out the inactive sites and collect the site title that might helps us to c

Balaji 13 Nov 25, 2022
🍰 ConnectMP - An easy and efficient way to share data between Processes in Python.

ConnectMP - Taking Multi-Process Data Sharing to the moon 🚀 Contribute · Community · Documentation 🎫 Introduction : 🍤 ConnectMP is the easiest and

Aiden Ellis 1 Dec 24, 2021
A BlackJack simulator in Python to simulate thousands or millions of hands using different strategies.

BlackJack Simulator (in Python) A BlackJack simulator to play any number of hands using different strategies The Rules To keep the code relatively sim

Hamid 4 Jun 24, 2022
Random Number Generator

Application for generating a random number.

Michael J Bailey 1 Oct 12, 2021
Cardano Stakepools: Check for scheduled blocks in current epoch.

ReLeaderLogs For Cardano Stakepool Operators: Lightweight Scheduled Blocks Checker for Current Epoch. No cardano-node Required, data is taken from blo

SNAKE (Cardano Stakepool) 2 Oct 19, 2021
Edit SRT files to delay subtitle time-stamps.

subtitle-delay A program written in Python that directly edits SRT file to delay the subtitles. Features: Will throw an error if delaying with negativ

8 Jul 17, 2022
An online streamlit development platform

streamlit-playground An online streamlit development platform Run, Experiment and Play with streamlit Components Develop full-fledged apps online All

Akshansh Kumar 3 Nov 06, 2021
Simple yet flexible natural sorting in Python.

natsort Simple yet flexible natural sorting in Python. Source Code: https://github.com/SethMMorton/natsort Downloads: https://pypi.org/project/natsort

Seth Morton 712 Dec 23, 2022
Python tool to check a web applications compliance with OWASP HTTP response headers best practices

Check Your Head A quick and easy way to check a web applications response headers!

Zak 6 Nov 09, 2021
Nmap script to guess* a GitLab version.

gitlab-version-nse Nmap script to guess* a GitLab version. Usage https://github.com/righel/gitlab-version-nse cd gitlab-version-nse nmap target --s

Luciano Righetti 120 Dec 05, 2022
A fixture that allows runtime xfail

pytest-runtime-xfail pytest plugin, providing a runtime_xfail fixture, which is callable as runtime_xfail(), to allow runtime decisions to mark a test

Brian Okken 4 Apr 06, 2022
Just some scripts to export vector tiles to geojson.

Vector tiles to GeoJSON Nowadays modern web maps are usually based on vector tiles. The great thing about vector tiles is, that they are not just imag

Lilith Wittmann 77 Jul 26, 2022
Runes - Simple Cookies You Can Extend (similar to Macaroons)

Runes - Simple Cookies You Can Extend (similar to Macaroons) is a paper called "Macaroons: Cookies with Context

Rusty Russell 22 Dec 11, 2022
Monte Carlo simulation of 3G rules

mc3g Monte Carlo simulation of 3G rules This project contains the Python code to do simulations of events according to the 3G rule (in German: "Geimpf

Jan Christoph Terasa 4 Nov 01, 2021
Utility to play with ADCS, allows to request tickets and collect information about related objects.

certi Utility to play with ADCS, allows to request tickets and collect information about related objects. Basically, it's the impacket copy of Certify

Eloy 185 Dec 29, 2022