Knowledge Management for Humans using Machine Learning & Tags

Overview

HyperTag

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your text documents (yes, even PDF's) and images. Instead of introducing proprietary file formats like other existing file organization tools, HyperTag just smoothly layers on top of your existing files without any fuss.

Objective Function: Minimize time between a thought and access to all relevant files.

Accompanying blog post: https://blog.neotree.uber.space/posts/hypertag-file-organization-made-for-humans

Table of Contents

Install

Available on PyPI

$ pip install hypertag (supports both CPU only & CUDA accelerated execution!)

Community

Join the HyperTag matrix chat room to stay up to date on the latest developments or to ask for help.

Overview

HyperTag offers a slick CLI but more importantly it creates a directory called HyperTagFS which is a file system based representation of your files and tags using symbolic links and directories.

Directory Import: Import your existing directory hierarchies using $ hypertag import path/to/directory. HyperTag converts it automatically into a tag hierarchy using metatagging.

Semantic Text & Image Search (Experimental): Search for images (jpg, png) and text documents (yes, even PDF's) content with a simple text query. Text search is powered by the awesome Sentence Transformers library. Text to image search is powered by OpenAI's CLIP model. Currently only English queries are supported.

HyperTag Daemon (Experimental): Monitors HyperTagFS and directories added to the auto import list for user changes (see section "Start HyperTag Daemon" below). Also spawns the DaemonService which speeds up semantic search significantly (warning: daemon process is a RAM hog with ~2GB usage).

Fuzzy Matching Queries: HyperTag uses fuzzy matching to minimize friction in the unlikely case of a typo.

File Type Groups: HyperTag automatically creates folders containing common files (e.g. Images: jpg, png, etc., Documents: txt, pdf, etc., Source Code: py, js, etc.), which can be found in HyperTagFS.

HyperTag Graph: Quickly get an overview of your HyperTag Graph! HyperTag visualizes the metatag graph on every change and saves it at HyperTagFS/hypertag-graph.pdf.

HyperTag Graph Example

CLI Functions

Import existing directory recursively

Import files with tags inferred from the existing directory hierarchy.

$ hypertag import path/to/directory

Add file/s or URL/s manually

$ hypertag add path/to/file https://github.com/SeanPedersen/HyperTag

Tag file/s (with values)

Manually tag files. Shortcut: $ hypertag t

$ hypertag tag humans/*.txt with human "Homo Sapiens"

Add a value to a file's tag:

$ hypertag tag sean.txt with name="Sean Pedersen"

Untag file/s

Manually remove tag/s from file/s.

$ hypertag untag humans/*.txt with human "Homo Sapiens"

Tag a tag

Metatag tag/s to create tag hierarchies. Shortcut: $ hypertag tt

$ hypertag metatag human with animal

Merge tags

Merge all associations (files & tags) of tag A into tag B.

$ hypertag merge human into "Homo Sapiens"

Query using Set Theory

Print file names of the resulting set matching the query. Queries are composed of tags (with values) and operands. Tags are fuzzy matched for convenience. Nesting is currently not supported, queries are evaluated from left to right.
Shortcut: $ hypertag q

Query with a value using a wildcard: $ hypertag query name="Sean*"
Print paths: $ hypertag query human --path
Print fuzzy matched tag: $ hypertag query man --verbose
Disable fuzzy matching: $ hypertag query human --fuzzy=0

Default operand is AND (intersection):
$ hypertag query human name="Sean*" is equivalent to $ hypertag query human and name="Sean*"

OR (union):
$ hypertag query human or "Homo Sapiens"

MINUS (difference):
$ hypertag query human minus "Homo Sapiens"

Index supported image and text files

Only indexed files can be searched.

$ hypertag index

To parse even unparseable PDF's, install tesseract: # pacman -S tesseract tesseract-data-eng

Index only image files: $ hypertag index --image
Index only text files: $ hypertag index --text

Semantic search for text files

A custom search algorithm combining semantic with token matching search. Print text file names sorted by matching score. Performance benefits greatly from running the HyperTag daemon.
Shortcut: $ hypertag s

$ hypertag search "your important text query" --path --score --top_k=10

Semantic search for image files

Print image file names sorted by matching score. Performance benefits greatly from running the HyperTag daemon.
Shortcut: $ hypertag si

Text to image: $ hypertag search_image "your image content description" --path --score --top_k=10

Image to image: $ hypertag search_image "path/to/image.jpg" --path --score --top_k=10

Start HyperTag Daemon

Start daemon process with triple functionality:

  • Watches HyperTagFS directory for user changes
    • Maps file (symlink) and directory deletions into tag / metatag removal/s
    • On directory creation: Interprets name as set theory tag query and automatically populates it with results
    • On directory creation in Search Images or Search Texts: Interprets name as semantic search query (add top_k=42 to limit result size) and automatically populates it with results
  • Watches directories on the auto import list for user changes:
    • Maps file changes (moves & renames) to DB
    • On file creation: Adds new file/s with inferred tag/s and auto-indexes it (if supported file format).
  • Spawns DaemonService to load and expose models used for semantic search, speeding it up significantly

$ hypertag daemon

Print all tags of file/s

$ hypertag tags filename1 filename2

Print all metatags of tag/s

$ hypertag metatags tag1 tag2

Print all tags

$ hypertag show

Print all files

Print names: $ hypertag show files

Print paths: $ hypertag show files --path

Visualize HyperTag Graph

Visualize the metatag graph hierarchy (saved at HyperTagFS root).

$ hypertag graph

Specify layout algorithm (default: fruchterman_reingold):

$ hypertag graph --layout=kamada_kawai

Generate HyperTagFS

Generate file system based representation of your files and tags using symbolic links and directories.

$ hypertag mount

Add directory to auto import list

Directories added to the auto import list will be monitored by the daemon for new files or changes.

$ hypertag add_auto_import_dir path/to/directory

Set HyperTagFS directory path

Default is the user's home directory.

$ hypertag set_hypertagfs_dir path/to/directory

Architecture

  • Python and it's vibrant open-source community power HyperTag
  • Many other awesome open-source projects make HyperTag possible (listed in pyproject.toml)
  • SQLite3 serves as the meta data storage engine (located at ~/.config/hypertag/hypertag.db)
  • Added URLs are saved in ~/.config/hypertag/web_pages for websites, others in ~/.config/hypertag/downloads
  • Symbolic links are used to create the HyperTagFS directory structure
  • Semantic Search: boosted using hnswlib
    • Text to text search is powered by the awesome DistilBERT
    • Text to image & image to image search is powered by OpenAI's impressive CLIP model

Development

  • Find prioritized issues here: TODO List
  • Pick an issue and comment how you plan to tackle it before starting out, to make sure no dev time is wasted.
  • Clone repo: $ git clone https://github.com/SeanPedersen/HyperTag.git
  • $ cd HyperTag/
  • Install Poetry
  • Install dependencies: $ poetry install
  • Activate virtual environment: $ poetry shell
  • Run all tests: $ pytest -v
  • Run formatter: $ black hypertag/
  • Run linter: $ flake8
  • Run type checking: $ mypy **/*.py
  • Run security checking: $ bandit --exclude tests/ -r .
  • Codacy: Dashboard
  • Run HyperTag: $ python -m hypertag

Inspiration

What is the point of HyperTag's existence?
HyperTag offers many unique features such as the import, semantic search, graphing and fuzzy matching functions that make it very convenient to use. All while HyperTag's code base staying relatively tiny at <2000 LOC compared to similar projects like TMSU (>10,000 LOC in Go) and SuperTag (>25,000 LOC in Rust), making it easy to hack on.

Owner
Ravn Tech, Inc.
Rapidly Emerging & Adapting Flock
Ravn Tech, Inc.
An Explainable Leaderboard for NLP

ExplainaBoard: An Explainable Leaderboard for NLP Introduction | Website | Download | Backend | Paper | Video | Bib Introduction ExplainaBoard is an i

NeuLab 319 Dec 20, 2022
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

Phone Level Mixture Density Network for TTS This repo contains pytorch implementation of paper Rich Prosody Diversity Modelling with Phone-level Mixtu

Rishikesh (ऋषिकेश) 42 Dec 13, 2022
Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

BLEU Score Implementation for paper: BLEU: a Method for Automatic Evaluation of Machine Translation Author: Ba Ngoc from ProtonX BLEU score is a popul

Ngoc Nguyen Ba 6 Oct 07, 2021
基于pytorch+bert的中文事件抽取

pytorch_bert_event_extraction 基于pytorch+bert的中文事件抽取,主要思想是QA(问答)。 要预先下载好chinese-roberta-wwm-ext模型,并在运行时指定模型的位置。

西西嘛呦 31 Nov 30, 2022
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

BP-Transformer This repo contains the code for our paper BP-Transformer: Modeling Long-Range Context via Binary Partition Zihao Ye, Qipeng Guo, Quan G

Zihao Ye 119 Nov 14, 2022
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 24.9k Jan 02, 2023
open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

7 Nov 02, 2022
Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

Workshop: Enterprise-Scale NLP with Hugging Face & Amazon SageMaker Earlier this year we announced a strategic collaboration with Amazon to make it ea

Philipp Schmid 161 Dec 16, 2022
Codes for processing meeting summarization datasets AMI and ICSI.

Meeting Summarization Dataset Meeting plays an essential part in our daily life, which allows us to share information and collaborate with others. Wit

xcfeng 39 Dec 14, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
Mednlp - Medical natural language parsing and utility library

Medical natural language parsing and utility library A natural language medical

Paul Landes 3 Aug 24, 2022
Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Boolean Prompting for Neural Text Generators Neural text generators like the GPT models promise a general-purpose means of manipulating texts. These m

Jeffrey M. Binder 20 Jan 09, 2023
Script and models for clustering LAION-400m CLIP embeddings.

clustering-laion400m Script and models for clustering LAION-400m CLIP embeddings. Models were fit on the first million or so image embeddings. A subje

Peter Baylies 22 Oct 04, 2022
FactSumm: Factual Consistency Scorer for Abstractive Summarization

FactSumm: Factual Consistency Scorer for Abstractive Summarization FactSumm is a toolkit that scores Factualy Consistency for Abstract Summarization W

devfon 83 Jan 09, 2023
Baseline code for Korean open domain question answering(ODQA)

Open-Domain Question Answering(ODQA)는 다양한 주제에 대한 문서 집합으로부터 자연어 질의에 대한 답변을 찾아오는 task입니다. 이때 사용자 질의에 답변하기 위해 주어지는 지문이 따로 존재하지 않습니다. 따라서 사전에 구축되어있는 Knowl

VUMBLEB 69 Nov 04, 2022
A python wrapper around the ZPar parser for English.

NOTE This project is no longer under active development since there are now really nice pure Python parsers such as Stanza and Spacy. The repository w

ETS 49 Sep 12, 2022
Material for GW4SHM workshop, 16/03/2022.

GW4SHM Workshop Wednesday, 16th March 2022 (13:00 – 15:15 GMT): Presented by: Dr. Rhodri Nelson, Imperial College London Project website: https://www.

Devito Codes 1 Mar 16, 2022
A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You

Shahrukh Khan 48 Jan 02, 2023
Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

Lau 1 Dec 17, 2021