🎐 a python library for doing approximate and phonetic matching of strings.

Last update: Dec 21, 2022

Overview

jellyfish

Jellyfish is a python library for doing approximate and phonetic matching of strings.

Written by James Turk <[email protected]> and Michael Stephens.

See https://github.com/jamesturk/jellyfish/graphs/contributors for contributors.

See http://jellyfish.readthedocs.io for documentation.

Source is available at http://github.com/jamesturk/jellyfish.

Jellyfish >= 0.7 only supports Python 3, if you need Python 2 please use 0.6.x.

Included Algorithms

String comparison:

Levenshtein Distance
Damerau-Levenshtein Distance
Jaro Distance
Jaro-Winkler Distance
Match Rating Approach Comparison
Hamming Distance

Phonetic encoding:

American Soundex
Metaphone
NYSIIS (New York State Identification and Intelligence System)
Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')
2
>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')
1

>>> jellyfish.metaphone(u'Jellyfish')
'JLFX'
>>> jellyfish.soundex(u'Jellyfish')
'J412'
>>> jellyfish.nysiis(u'Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex(u'Jellyfish')
'JLLFSH'

Running Tests

If you are interested in contributing to Jellyfish, you may want to run tests locally. Jellyfish uses tox to run tests, which you can setup and run as follows:

pip install tox
# cd jellyfish/
tox

🎐 a python library for doing approximate and phonetic matching of strings.

Related tags

Overview

jellyfish

Included Algorithms

Example Usage

Running Tests

Owner

James Turk

Simple NLP based project without any use of AI

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Fast, general, and tested differentiable structured prediction in PyTorch

Repositório do trabalho de introdução a NLP

Train BPE with fastBPE, and load to Huggingface Tokenizer.

StarGAN - Official PyTorch Implementation

Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

2021语言与智能技术竞赛：机器阅读理解任务

多语言降噪预训练模型MBart的中文生成任务

leaking paid token generator that was a shit lmao for 100$ haha

Pangu-Alpha for Transformers

Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

Search for documents in a domain through Google. The objective is to extract metadata

Count the frequency of letters or words in a text file and show a graph.

Training code for Korean multi-class sentiment analysis

Binaural Speech Synthesis

NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation