Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Overview

SwinTextSpotter

This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022). The paper is available at this link.

Models

SWINTS-swin-english-pretrain [config] | model_Google Drive | model_BaiduYun PW: 954t

SWINTS-swin-Total-Text [config] | model_Google Drive | model_BaiduYun PW: tf0i

SWINTS-swin-ctw [config] | model_Google Drive | model_BaiduYun PW: 4etq

SWINTS-swin-icdar2015 [config] | model_Google Drive | model_BaiduYun PW: 3n82

SWINTS-swin-ReCTS [config] | model_Google Drive | model_BaiduYun PW: a4be

SWINTS-swin-vintext [config] | model_Google Drive | model_BaiduYun PW: slmp

Installation

  • Python=3.8
  • PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1
  • OpenCV for visualization

Steps

  1. Install the repository (we recommend to use Anaconda for installation.)
conda create -n SWINTS python=3.8 -y
conda activate SWINTS
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install opencv-python
pip install scipy
pip install shapely
pip install rapidfuzz
pip install timm
pip install Polygon3
git clone https://github.com/mxin262/SwinTextSpotter.git
cd SwinTextSpotter
python setup.py build develop
  1. dataset path
datasets
|_ totaltext
|  |_ train_images
|  |_ test_images
|  |_ totaltext_train.json
|  |_ weak_voc_new.txt
|  |_ weak_voc_pair_list.txt
|_ mlt2017
|  |_ train_images
|  |_ annotations/icdar_2017_mlt.json
.......

Downloaded images

Downloaded label[Google Drive] [BaiduYun] PW: 46vd

Downloader lexicion[Google Drive] and place it to corresponding dataset.

You can also prepare your custom dataset following the example scripts. [example scripts]

Totaltext

To evaluate on Total Text, CTW1500, ICDAR2015, first download the zipped annotations with

cd datasets
mkdir evaluation
cd evaluation
wget -O gt_ctw1500.zip https://cloudstor.aarnet.edu.au/plus/s/xU3yeM3GnidiSTr/download
wget -O gt_totaltext.zip https://cloudstor.aarnet.edu.au/plus/s/SFHvin8BLUM4cNd/download
wget -O gt_icdar2015.zip https://drive.google.com/file/d/1wrq_-qIyb_8dhYVlDzLZTTajQzbic82Z/view?usp=sharing
wget -O gt_vintext.zip https://drive.google.com/file/d/11lNH0uKfWJ7Wc74PGshWCOgSxgEnUPEV/view?usp=sharing
  1. Pretrain SWINTS (e.g., with Swin-Transformer backbone)
python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-pretrain.yaml
  1. Fine-tune model on the mixed real dataset
python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-mixtrain.yaml
  1. Fine-tune model
python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml
  1. Evaluate SWINTS (e.g., with Swin-Transformer backbone)
python projects/SWINTS/train_net.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --eval-only MODEL.WEIGHTS ./output/model_final.pth
  1. Visualize the detection and recognition results (e.g., with ResNet50 backbone)
python demo/demo.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --input input1.jpg \
  --output ./output \
  --confidence-threshold 0.4 \
  --opts MODEL.WEIGHTS ./output/model_final.pth

Example results:

Acknowlegement

Adelaidet, Detectron2, ISTR, SwinT_detectron2, Focal-Transformer and MaskTextSpotterV3.

Citation

If our paper helps your research, please cite it in your publications:

@article{huang2022swints,
  title = {SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition},
  author = {Mingxin Huang and YuLiang liu and Zhenghao Peng and Chongyu Liu and Dahua Lin and Shenggao Zhu and Nicholas Yuan and Kai Ding and Lianwen Jin},
  journal={arXiv preprint arXiv:2203.10209},
  year = {2022}
}

Copyright

For commercial purpose usage, please contact Dr. Lianwen Jin: [email protected]

Copyright 2019, Deep Learning and Vision Computing Lab, South China China University of Technology. http://www.dlvc-lab.net

Owner
mxin262
mxin262
Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

SemCo The official pytorch implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

42 Nov 14, 2022
Python module providing a framework to trace individual edges in an image using Gaussian process regression.

Edge Tracing using Gaussian Process Regression Repository storing python module which implements a framework to trace individual edges in an image usi

Jamie Burke 7 Dec 27, 2022
A fuzzing framework for SMT solvers

yinyang A fuzzing framework for SMT solvers. Given a set of seed SMT formulas, yinyang generates mutant formulas to stress-test SMT solvers. yinyang c

Project Yin-Yang for SMT Solver Testing 145 Jan 04, 2023
GAN Image Generator and Characterwise Image Recognizer with python

MODEL SUMMARY 모델의 구조는 크게 6단계로 나뉩니다. STEP 0: Input Image Predict 할 이미지를 모델에 입력합니다. STEP 1: Make Black and White Image STEP 1 은 입력받은 이미지의 글자를 흑색으로, 배경을

Juwan HAN 1 Feb 09, 2022
StyleGAN2-ADA - Official PyTorch implementation

Need Help? If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in Oct

Derrick Schultz 217 Jan 04, 2023
Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral] Learning to Disambiguate Strongly In

Zicong Fan 40 Dec 22, 2022
Custom Implementation of Non-Deep Networks

ParNet Custom Implementation of Non-deep Networks arXiv:2110.07641 Ankit Goyal, Alexey Bochkovskiy, Jia Deng, Vladlen Koltun Official Repository https

Pritama Kumar Nayak 20 May 27, 2022
Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

marge This repository releases the code for Generating Query Focused Summaries from Query-Free Resources. Please cite the following paper [bib] if you

Yumo Xu 28 Nov 10, 2022
Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"

Adversarial Neuron Pruning Purifies Backdoored Deep Models Code for NeurIPS 2021 "Adversarial Neuron Pruning Purifies Backdoored Deep Models" by Dongx

Dongxian Wu 31 Dec 11, 2022
This is the winning solution of the Endocv-2021 grand challange.

Endocv2021-winner [Paper] This is the winning solution of the Endocv-2021 grand challange. Dependencies pytorch # tested with 1.7 and 1.8 torchvision

Vajira Thambawita 14 Dec 03, 2022
Playable Video Generation

Playable Video Generation Playable Video Generation Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci Paper: ArX

Willi Menapace 136 Dec 31, 2022
[ICCV 2021] Code release for "Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks"

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks By Yikai Wang, Yi Yang, Fuchun Sun, Anbang Yao. This is the pytorc

Yikai Wang 26 Nov 20, 2022
[CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo Lukas Koestler1*    Nan Yang1,2*,†    Niclas Zeller2,3    Daniel Cremers1

TUM Computer Vision Group 744 Jan 04, 2023
Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021] SPEC: Seeing People in the Wild with an Estimated Camera, Muhammed Kocabas, Chun-

Muhammed Kocabas 187 Dec 26, 2022
Implement some metaheuristics and cost functions

Metaheuristics This repot implement some metaheuristics and cost functions. Metaheuristics JAYA Implement Jaya optimizer without constraints. Cost fun

Adri1G 1 Mar 23, 2022
KaziText is a tool for modelling common human errors.

KaziText KaziText is a tool for modelling common human errors. It estimates probabilities of individual error types (so called aspects) from grammatic

ÚFAL 3 Nov 24, 2022
QuanTaichi evaluation suite

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 120 Jan 04, 2023
Ejemplo Algoritmo Viterbi - Example of a Viterbi algorithm applied to a hidden Markov model on DNA sequence

Ejemplo Algoritmo Viterbi Ejemplo de un algoritmo Viterbi aplicado a modelo ocul

Mateo Velásquez Molina 1 Jan 10, 2022
A library for low-memory inferencing in PyTorch.

Pylomin Pylomin (PYtorch LOw-Memory INference) is a library for low-memory inferencing in PyTorch. Installation ... Usage For example, the following c

3 Oct 26, 2022
Python scripts form performing stereo depth estimation using the HITNET model in ONNX.

ONNX-HITNET-Stereo-Depth-estimation Python scripts form performing stereo depth estimation using the HITNET model in ONNX. Stereo depth estimation on

Ibai Gorordo 30 Nov 08, 2022