DocuMiner
A production-ready pipeline for text mining and subject indexing
Want to Contribute?
More code and documentation coming soon.
Authors
Open Source Club
A production-ready pipeline for text mining and subject indexing
More code and documentation coming soon.
Open Source Club
Microsoft's Cascadia Code font customized to my liking. Also includes some simple batch patch and bake scripts to batch patch glyphs and bake font features into fonts!
deasciify-highlighted is a Python script for deasciifying text to Turkish and copying clipboard.
Text Based Adventure Jam Author: Devin McIntyre Our goal is two-fold: Create a text based adventure game engine that can parse a standard file format
WeKws Production First and Production Ready End-to-End Keyword Spotting Toolkit. The goal of this toolkit it to... Small footprint keyword spotting (K
AskMe This script is completely terminal based. No user interface is added. You can get the command line options by using the --help argument. Make su
split Word file by chapter we use the mircosoft word api to code this tool api url:https://docs.microsoft.com/zh-cn/dotnet/api/ if this tool is good f
Chardet: The Universal Character Encoding Detector Detects ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) Big5, GB2312, EUC-TW, HZ-GB-2312, IS
Meaningful Word Generator Generates meaningful words from dictionary with given no. of letters and words. This might be useful for generating short li
texthooks A collection of pre-commit hooks for handling text files. In particular, hooks for handling unicode characters which may be undesirable in a
BaseCrack Decoder For Base Encoding Schemes BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes. This tool ca
flomo-word-cloud 从flomo导出的笔记中生成词云 如何使用? 将本项目克隆到你的电脑上,使用如下的命令,安装所需python库 pip install -r requirements.txt 在项目里新建一个file文件夹,把所有从flomo导出的html文件放入其中 运行main
phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,
Split large XML files into smaller ones for easy upload. Works for WordPress Posts Import and other XML files.
Find frequency of letters appearing in 5-letter words in the English language In
Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate
colormate Python script text formatting package What is colormate? colormate is a python library that lets you add text formatting to your scripts, it
🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙♀️
afasi Fuzz a language by mixing up only few words. Status Beta. Note: The default branch is default. Use Examples Version General Help Translate Help
Hspell, the free Hebrew spellchecker and morphology engine.
Unicode Slugify Unicode Slugify is a slugifier that generates unicode slugs. It was originally used in the Firefox Add-ons web site to generate slugs