Yaspeller Dictionary (Auto)builder

Usage

# this sample command generates `./yaspeller_report.json`
# yaspeller --report json --ignore-digits --ignore-text "'.*" --ignore-latin --only-errors --file-extensions ".md" --lang ru

python -m venv env
source env/bin/activate
pip install 
python src/dictionary.py yaspeller_report.json

Why

Yaspeller is nice, but there are too many anglicisms in a usual documentation. Normally you just want to ignore that, but there's the only possibility to add a regexp-array to ignore words.

This generates a array of dictionary words including all lexems for all cases like

[
    "[бБ]аг(а|ам|ами|ах|е|и|ов|ом|у)?",
    "[дД]ифф(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[кК]оммит(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[пП]атчинг(а|ам|ами|ах|е|и|ов|ом|у)?",
    "[рР]убист(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[сС]амоорганизованн(ого|ом|ому|ую|ые|ый|ым|ыми|ых)",
    "[тТ]икет(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "коммитить"
]

from yaspeller errors (in text format looking like)

Spelling check:
✗ www.ruby-lang.org/ru/community/ruby-core/index.md 130 ms
-----
Typos: 9
1. патчингом (36:27)
2. коммитить (68:32, suggest: комитет)
3. багах (75:15, suggest: богах, баках, бегах)
4. баги (89:24, suggest: багги)
5. баг (96:25)
6. тикет (107:14, suggest: этикет)
7. дифф (115:18)
8. коммиту (147:24, suggest: комету, комнату)
9. коммита (148:58, suggest: комета)
-----

Live example

Initially created for www.ruby-lang.org translations spellchecking

🤕 spelling exceptions builder for lazy people

Related tags

Overview

Yaspeller Dictionary (Auto)builder

Usage

Why

Live example

Owner

Vlad Bokov

Predict the spans of toxic posts that were responsible for the toxic label of the posts

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Tools to download and cleanup Common Crawl data

Code for the paper PermuteFormer

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"

Twitter Sentiment Analysis using #tag, words and username

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

A Fast Sequence Transducer Implementation with PyTorch Bindings

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Utilize Korean BERT model in sentence-transformers library

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation.

Ecommerce product title recognition package

Treemap visualisation of Maya scene files

Prompt tuning toolkit for GPT-2 and GPT-Neo

Retraining OpenAI's GPT-2 on Discord Chats

✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.