Searches a document for hash tags. Support multiple natural languages. Works in various contexts.

Overview

ht-getter

Searches a document for hash tags. Supports multiple natural languages. Works in various contexts.

This package uses a non-regex approach and supports both halfwidth and fullwidth alphanumeric characters as well as various writing systems.

Installation

pip install ht-getter

Function

def get_hash_tags(source, mode = "strings")

Arguments:

source -> The source text to be searched. Must be passed as a str type.

mode -> Specifies the mode of the results. The default value is “strings”

mode = “strings” -> The results are returned as a list of strings

mode = “indices” -> The results are returned as a list of lists of the start and end indices of the hash tags.

Code Sample

from ht_getter.getter import get_hash_tags

source_text = '''This simple package helps find you find #hash_tags in various types of #documents#. It also works with other languages like #日本語 or #한국어.
It supports #fullwidth #alpha-numeric characters. You can get a #list of the #hash_tags or a list of their #indices in the #####source_text."
'''

hash_tags = get_hash_tags(source_text)
hash_tag_indices = get_hash_tags(source_text, mode = "indices")

print(hash_tags)
print(hash_tag_indices)

Things to Keep in Mind:

  1. This package can be used in various contexts. (Social media, news articles, etc.)
  2. This package looks for substrings that have the structure of a hash tag but does not check that the substring is a valid hash tag on any platform.
You might also like...
Soccerdata - Efficiently scrape soccer data from various sources
Soccerdata - Efficiently scrape soccer data from various sources

SoccerData is a collection of wrappers over soccer data from Club Elo, ESPN, FBr

Python Tool to Easily Generate Multiple Documents

Python Tool to Easily Generate Multiple Documents Running the script doesn't require internet Max Generation is set to 10k to avoid lagging/crashing R

Build documentation in multiple repos into one site.

mkdocs-multirepo-plugin Build documentation in multiple repos into one site. Setup Install plugin using pip: pip install git+https://github.com/jdoiro

Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python
Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python

Deep AutoViML Pipeline for orchest.io Quickstart Build Deep Learning models with a single line of code: deep_autoviml Deep AutoViML helps you build te

Type hints support for the Sphinx autodoc extension

sphinx-autodoc-typehints This extension allows you to use Python 3 annotations for documenting acceptable argument types and return value types of fun

AWS Tags As A Database is a Python library using AWS Tags as a Key-Value database.

AWS Tags As A Database is a Python library using AWS Tags as a Key-Value database. This database is completely free* 💸

Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.
Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.

What is Craxk ? Craxk is a UNIQUE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reprod

Find target hash collisions for Apple's NeuralHash perceptual hash function.💣
Find target hash collisions for Apple's NeuralHash perceptual hash function.💣

neural-hash-collider Find target hash collisions for Apple's NeuralHash perceptual hash function. For example, starting from a picture of this cat, we

That Hash will name that hash type! Identify MD5, SHA256 and 300+ other hashes Comes with
That Hash will name that hash type! Identify MD5, SHA256 and 300+ other hashes Comes with

Call for translators! We're looking for translators to help translate this spec for everyone! Read this documentation in the following languages 한국어 中

A Pytorch implementation of
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

A Pytorch implementation of
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema
document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

document organizer with tags and full-text-search, in a simple and clean sqlite3 schema

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Transparent proxy server that works as a poor man's VPN. Forwards over ssh. Doesn't require admin. Works with Linux and MacOS. Supports DNS tunneling.

sshuttle: where transparent proxy meets VPN meets ssh As far as I know, sshuttle is the only program that solves the following common case: Your clien

A Regex based linter tool that works for any language and works exclusively with custom linting rules.

renag Documentation Available Here Short for Regex (re) Nag (like "one who complains"). Now also PEGs (Parsing Expression Grammars) compatible with py

Code for CVPR 2021 oral paper
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

Releases(v0.0.1)
Owner
Rairye
NLP (mostly tokenization, normalization, and topics concerning CJK languages) Most projects currently available in JS and Python.
Rairye
Rust Markdown Parsing Benchmarks

Rust Markdown Parsing Benchmarks This repo tries to assess Rust markdown parsing

Ed Page 1 Aug 24, 2022
A simple flask application to collect annotations for the Turing Change Point Dataset, a benchmark dataset for change point detection algorithms

AnnotateChange Welcome to the repository of the "AnnotateChange" application. This application was created to collect annotations of time series data

The Alan Turing Institute 16 Jul 21, 2022
pytorch_example

pytorch_examples machine learning site map 정리자료 Resnet https://wolfy.tistory.com/243 convolution 연산 정리 https://gaussian37.github.io/dl-concept-covolut

injae hwang 1 Nov 24, 2021
Beautiful static documentation generator for OpenAPI/Swagger 2.0

Spectacle The gentleman at REST Spectacle generates beautiful static HTML5 documentation from OpenAPI/Swagger 2.0 API specifications. The goal of Spec

Sourcey 1.3k Dec 13, 2022
Fast syllable estimation library based on pattern matching.

Syllables: A fast syllable estimator for Python Syllables is a fast, simple syllable estimator for Python. It's intended for use in places where speed

ProseGrinder 26 Dec 14, 2022
SqlAlchemy Flask-Restful Swagger Json:API OpenAPI

SAFRS: Python OpenAPI & JSON:API Framework Overview Installation JSON:API Interface Resource Objects Relationships Methods Custom Methods Class Method

Thomas Pollet 361 Nov 16, 2022
The purpose of this project is to share knowledge on how awesome Streamlit is and can be

Awesome Streamlit The fastest way to build Awesome Tools and Apps! Powered by Python! The purpose of this project is to share knowledge on how Awesome

Marc Skov Madsen 1.5k Jan 07, 2023
A Sublime Text plugin to select a default syntax dialect

Default Syntax Chooser This Sublime Text 4 plugin provides the set_default_syntax_dialect command. This command manipulates a syntax file (e.g.: SQL.s

3 Jan 14, 2022
Course Materials for Math 340

UBC Math 340 Materials This repository aims to be the one repository for which you can find everything you about Math 340. Lecture Notes Lecture Notes

2 Nov 25, 2021
Workbench to integrate pyoptools with freecad, that means basically optics ray tracing capabilities for FreeCAD.

freecad-pyoptools Workbench to integrate pyoptools with freecad, that means basically optics ray tracing capabilities for FreeCAD. Requirements It req

Combustión Ingenieros SAS 12 Nov 16, 2022
Collection of Summer 2022 tech internships!

Collection of Summer 2022 tech internships!

Pitt Computer Science Club (CSC) 15.6k Jan 03, 2023
🍭 epub generator for lightnovel.us 轻之国度 epub 生成器

lightnovel_epub 本工具用于基于轻之国度网页生成epub小说。 注意:本工具仅作学习交流使用,作者不对内容和使用情况付任何责任! 原理 直接抓取 HTML,然后将其中的图片下载至本地,随后打包成 EPUB。

gyro永不抽风 188 Dec 30, 2022
DeltaPy - Tabular Data Augmentation (by @firmai)

DeltaPy⁠⁠ — Tabular Data Augmentation & Feature Engineering Finance Quant Machine Learning ML-Quant.com - Automated Research Repository Introduction T

Derek Snow 470 Dec 28, 2022
Simple yet powerful CAD (Computer Aided Design) library, written with Python.

Py-MADCAD it's time to throw parametric softwares out ! Simple yet powerful CAD (Computer Aided Design) library, written with Python. Installation

jimy byerley 124 Jan 06, 2023
In this Github repository I will share my freqtrade files with you. I want to help people with this repository who don't know Freqtrade so much yet.

My Freqtrade stuff In this Github repository I will share my freqtrade files with you. I want to help people with this repository who don't know Freqt

Simon Kebekus 104 Dec 31, 2022
PythonCoding Tutorials - Small functions that would summarize what is needed for python coding

PythonCoding_Tutorials Small functions that would summarize what is needed for p

Hosna Hamdieh 2 Jan 03, 2022
A fast time mocking alternative to freezegun that wraps libfaketime.

python-libfaketime: fast date/time mocking python-libfaketime is a wrapper of libfaketime for python. Some brief details: Linux and OS X, Pythons 3.5

Simon Weber 68 Jun 10, 2022
Visualizacao-dados-dell - Repositório com as atividades desenvolvidas no curso de Visualização de Dados

📚 Descrição Neste curso da Dell trabalhamos com a visualização de dados. 🖥️ Aulas 1.1 - Explorando conjuntos de dados 1.2 - Fundamentos de visualiza

Claudia dos Anjos 1 Dec 28, 2021
The tutorial is a collection of many other resources and my own notes

Why we need CTC? --- looking back on history 1.1. About CRNN 1.2. from Cross Entropy Loss to CTC Loss Details about CTC 2.1. intuition: forward algor

手写AI 7 Sep 19, 2022
Tutorial for STARKs with supporting code in python

stark-anatomy STARK tutorial with supporting code in python Outline: introduction overview of STARKs basic tools -- algebra and polynomials FRI low de

121 Jan 03, 2023