Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Overview

Scene-Text-Detection-with-SPCNET

Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.08605] with tensorflow.

参考代码

网络实现主要借鉴Keras版本的Mask-RCNN,训练数据接口参考了argman/EAST.论文作者在知乎的文章介绍SPCNet.

训练

1、训练数据准备

训练数据放在data/下,训练数据准备在data/icdar.py:

data

icdar2017

Annotaions //image_1.txt
JPEGImages //image_1.jpg
train.txt //存储训练图片的名称,例如:image_1

2、参数修改

修改./train.py中的学习率、batch、模型存储路径等参数,如果需要调整网络参数,在nets/config.py中修改。

3、执行训练

python train.py

代码运行环境:Python2.7 tensorflow-gpu1.13 单张1080Ti

测试

修改demo.py中的模型文件夹路径、测试图片路径,然后执行python demo.py

测试结果:论文中还有一些地方我也不确定,因此目前没有在公开数据集测试。值得注意的是,按照原文中的训练说明,最好在多卡上训练,请加大你的batch size.

值得注意的地方

1、global text segmentation(gts)的训练

计算gts训练时损失函数时,我采用的方法是将feature pyramid的各个level产生的gts分别与全局mask gt计算softmax loss,然后取平均作为Loss_gts。因为没找到与原文关于这一块的描述,因此可能是其他的计算方法:每个level准备不同的mask_gt、将多个level的gts预测融合计算loss等等。感兴趣的可以去问问作者或者自己试试。

2、实现Rescore 时gts的选取

计算predict box对应的pyramid level,然后选取对应的gts计算。还有一种思路是:融合P2,P3,P4,P5的gts,然后计算box rescore.

3、Bounding Box的生成

MASK RCNN中是先对输出的box进行阈值过滤以及NMS,然后将剩余的回归之后的box对应的rois送入mask branch计算mask,目的是减少计算量同时获得更准确的mask。SPCNet为了减小FP与FN,对Inference流程做了修改:先对模型输出的box与mask进行Rescore,然后经过threshold filter,再对剩下的mask求Bounding Box,然后利用Poly NMS减少重叠,输出剩下的。

在目前代码(nets/models.py utils.py)里:是先对模型输出的box与mask进行Rescore,然后经过threshold filter与NMS,再对剩下的mask求Bounding Box,然后直接输出。

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 933 Dec 29, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

68 Dec 14, 2022
Memory tests solver with using OpenCV

Human Benchmark project This project is OpenCV based programs which are puzzle solvers for 7 different games for https://humanbenchmark.com/. made as

Bahadır Araz 24 Dec 27, 2022
Here use convulation with sobel filter from scratch in opencv python .

Here use convulation with sobel filter from scratch in opencv python .

Tamzid hasan 2 Nov 11, 2021
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

758 Dec 22, 2022
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

213 Nov 12, 2022
Generate text images for training deep learning ocr model

New version release:https://github.com/oh-my-ocr/text_renderer Text Renderer Generate text images for training deep learning OCR model (e.g. CRNN). Su

Qing 1.2k Jan 04, 2023
Autonomous Driving project for Euro Truck Simulator 2

hope-autonomous-driving Autonomous Driving project for Euro Truck Simulator 2 Video: How is it working ? In this video, the program processes the imag

Umut Görkem Kocabaş 36 Nov 06, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Shape Detection - It's a shape detection project with OpenCV and Python.

Shape Detection It's a shape detection project with OpenCV and Python. Setup pip install opencv-python for doing AI things. pip install simpleaudio fo

1 Nov 26, 2022
BNF Globalization Code (CVPR 2016)

Boundary Neural Fields Globalization This is the code for Boundary Neural Fields globalization method. The technical report of the method can be found

25 Apr 15, 2022
Characterizing possible failure modes in physics-informed neural networks.

Characterizing possible failure modes in physics-informed neural networks This repository contains the PyTorch source code for the experiments in the

Aditi Krishnapriyan 55 Jan 02, 2023
The virtual calculator will be above the live streaming from your camera

The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the

gasbaoui mohammed al amine 5 Jul 01, 2022
This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

Merantix 14 Oct 12, 2022
一键翻译各类图片内文字

一键翻译各类图片内文字 针对群内、各个图站上大量不太可能会有人去翻译的图片设计,让我这种日语小白能够勉强看懂图片 主要支持日语,不过也能识别汉语和小写英文 支持简单的涂白和嵌字

574 Dec 28, 2022
Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

Matt Zucker 1.2k Dec 29, 2022
Tesseract Open Source OCR Engine (main repository)

Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM

48.4k Jan 09, 2023
Automatically resolve RidderMaster based on TensorFlow & OpenCV

AutoRiddleMaster Automatically resolve RidderMaster based on TensorFlow & OpenCV 基于 TensorFlow 和 OpenCV 实现的全自动化解御迷士小马谜题 Demo How to use Deploy the ser

神龙章轩 5 Nov 19, 2021
Document Layout Analysis

Eynollah Document Layout Analysis Introduction This tool performs document layout analysis (segmentation) from image data and returns the results as P

QURATOR-SPK 198 Dec 29, 2022