a micro OCR network with 0.07mb params.

Overview

MicroOCR

a micro OCR network with 0.07mb params.

    Layer (type)               Output Shape         Param #

        Conv2d-1            [-1, 64, 8, 32]           3,136
   BatchNorm2d-2            [-1, 64, 8, 32]             128
          GELU-3            [-1, 64, 8, 32]               0
     ConvBNACT-4            [-1, 64, 8, 32]               0
        Conv2d-5            [-1, 64, 8, 32]             640
   BatchNorm2d-6            [-1, 64, 8, 32]             128
          GELU-7            [-1, 64, 8, 32]               0
     ConvBNACT-8            [-1, 64, 8, 32]               0
        Conv2d-9            [-1, 64, 8, 32]           4,160
  BatchNorm2d-10            [-1, 64, 8, 32]             128
         GELU-11            [-1, 64, 8, 32]               0
    ConvBNACT-12            [-1, 64, 8, 32]               0
   MicroBlock-13            [-1, 64, 8, 32]               0
       Conv2d-14            [-1, 64, 8, 32]             640
  BatchNorm2d-15            [-1, 64, 8, 32]             128
         GELU-16            [-1, 64, 8, 32]               0
    ConvBNACT-17            [-1, 64, 8, 32]               0
       Conv2d-18            [-1, 64, 8, 32]           4,160
  BatchNorm2d-19            [-1, 64, 8, 32]             128
         GELU-20            [-1, 64, 8, 32]               0
    ConvBNACT-21            [-1, 64, 8, 32]               0
   MicroBlock-22            [-1, 64, 8, 32]               0
      Flatten-23              [-1, 64, 256]               0
AdaptiveAvgPool1d-24           [-1, 64, 30]               0
       Linear-25               [-1, 30, 60]           3,900

Total params: 17,276
Trainable params: 17,276
Non-trainable params: 0
Input size (MB): 0.05
Forward/backward pass size (MB): 2.90
Params size (MB): 0.07
Estimated Total Size (MB): 3.02

Script Description

MicroOCR
├── README.md                                   # Descriptions about MicroNet
├── collatefn.py                                # collatefn
├── ctc_label_converter.py                      # accuracy metric for MicroNet
├── dataset.py                                  # Data preprocessing for training and evaluation
├── demo.py                                     # demo
├── gen_image.py                                # generate image for train and eval
├── infer_tool.py                               # inference tool
├── keys.py                                     # character
├── loss.py                                     # Ctcloss definition
├── metric.py                                   # accuracy metric for MicroNet
├── model.py                                    # MicroNet
├── train.py                                    # train the model

Generate data for train and eval

python gen_image.py

Training

python train.py

Inference

python demo.py
Owner
william
william
Official implementation of Character Region Awareness for Text Detection (CRAFT)

CRAFT: Character-Region Awareness For Text detection Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Clova AI Research 2.5k Jan 03, 2023
Text language identification using Wikipedia data

Text language identification using Wikipedia data The aim of this project is to provide high-quality language detection over all the web's languages.

Vsevolod Dyomkin 28 Jul 09, 2022
TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

Minghui Liao 930 Jan 04, 2023
An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss This is an unofficial implementation of AutoVC based on the official one. The reposi

Chien-yu Huang 27 Jun 16, 2022
document image degradation

ocrodeg The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OC

NVIDIA Research Projects 134 Nov 18, 2022
Text modding tools for FF7R (Final Fantasy VII Remake)

FF7R_text_mod_tools Subtitle modding tools for FF7R (Final Fantasy VII Remake) There are 3 tools I made. make_dualsub_mod.exe: Merges (or swaps) subti

10 Dec 19, 2022
Drowsiness Detection and Alert System

A countless number of people drive on the highway day and night. Taxi drivers, bus drivers, truck drivers, and people traveling long-distance suffer from lack of sleep.

Astitva Veer Garg 4 Aug 01, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Mathematical formulae extractor The goal of this project is to create a learning based system that takes an image of a math formula and returns corres

6 May 22, 2022
ocroseg - This is a deep learning model for page layout analysis / segmentation.

ocroseg This is a deep learning model for page layout analysis / segmentation. There are many different ways in which you can train and run it, but by

NVIDIA Research Projects 71 Dec 06, 2022
This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

Vectorizing color range This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vect

Development Seed 9 Jul 27, 2022
Simple app for visual editing of Page XML files

Name nw-page-editor - Simple app for visual editing of Page XML files. Version: 2021.02.22 Description nw-page-editor is an application for viewing/ed

Mauricio Villegas 27 Jun 20, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 65.7k Jan 03, 2023
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 13 Dec 17, 2022
Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera.

Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera. Fingertip location is mapped to RGB images to control the mouse cursor.

Ravi Sharma 71 Dec 20, 2022
Image Recognition Model Generator

Takes a user-inputted query and generates a machine learning image recognition model that determines if an inputted image is or isn't their query

Christopher Oka 1 Jan 13, 2022
An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicing

ZATCA (Fatoora) QR-Code Implementation An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicin

TheAwiteb 28 Nov 03, 2022
Zoom , GoogleMeets에서 Vtuber 데뷔하기

EasyVtuber Facial landmark와 GAN을 이용한 Character Face Generation Google Meets, Zoom 등에서 자신만의 웹툰, 만화 캐릭터로 대화해보세요! 악세사리는 어느정도 추가해도 잘 작동해요! 안타깝게도 RTX 2070

Gunwoo Han 140 Dec 23, 2022
TedEval: A Fair Evaluation Metric for Scene Text Detectors

TedEval: A Fair Evaluation Metric for Scene Text Detectors Official Python 3 implementation of TedEval | paper | slides Chae Young Lee, Youngmin Baek,

Clova AI Research 167 Nov 20, 2022
[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Grounded Situation Recognition with Transformers Paper | Model Checkpoint This is the official PyTorch implementation of Grounded Situation Recognitio

Junhyeong Cho 18 Jul 19, 2022