Simple integer-valued time series bit packing

Overview

Smahat Time Series Encoding

Smahat allows to encode a sequence of integer values using a fixed (for all values) number of bits but minimal with regards to the data range. For example: for a series of boolean values only one bit is needed, for a series of integer percentages 7 bits are needed, etc.

Smahat is useful when:

  • Time series is integer-valued. (It doesn't work with floats :))
  • The range of the data is known in advance (if not streaming, this is not necessary).
  • The data range is relatively small.
  • The data does not have properties that would make other compression algorithms useful, or these other algorithms have an unacceptable cost for the use case.

Smahat can also be used as a baseline to calculate the true compression ratio of a compression algorithm on data of a certain nature.

Installation

To install the latest release:

$ pip install smahat

You can also build a local package and install it:

$ make build
$ pip install dist/*.whl

Usage

Import smahat module.

>>> import smahat

Data to encode.

>>> values = [12, 0, 17, 15, 78, 10]

Encoding

You can use encode_next to encode one value by one:

>>> encoder = smahat.Encoder(min_value=0, max_value=100, strategy='saturate')
>>> for v in values:
...     encoder.encode_next(v)
>>> content = encoder.get_encoded()
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Or you can use Encoder.encode_all to encode all values (range min and max will be inferred from values if not provided):

>>> content = smahat.Encoder.encode_all(values, min_value=0, max_value=100, strategy='saturate')
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Decoding

To decode use Decoder.decode_all.

>>> smahat.Decoder.decode_all(content)
[12, 0, 17, 15, 78, 10]

Encoding result

The result of the encoding of a sequence of values using Smahat is a SmahatContent dictionary containing the encoded data, plus three fields : shift is used to bring the data range to start from zero (values are shifted and encoded in pure binary), n_bits_per_value indicates the number of bits used to encode each value, n_padding_bits (between 0 and 7) indicates the number of unused padding bits within the last byte.

class SmahatContent(TypedDict):
    encoded: bytes
    shift: int
    n_bits_per_value: int
    n_padding_bits: int

If you want to use this library for message exchanges, you can serialize the result of the encoding as you like (JSON, protobuf, etc.)

Contribute

$ git clone https://github.com/ghilesmeddour/smahat-time-series-encoding.git
$ cd smahat-time-series-compression
make format
make dead-code-check
make test
make type-check
make coverage
make build

TODOs

  • Add unit tests.
  • Improve doc.
You might also like...
Simple yet flexible natural sorting in Python.

natsort Simple yet flexible natural sorting in Python. Source Code: https://github.com/SethMMorton/natsort Downloads: https://pypi.org/project/natsort

A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!
A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!

Ubelt is a small library of robust, tested, documented, and simple functions that extend the Python standard library. It has a flat API that all behav

Simple python module to get the information regarding battery in python.
Simple python module to get the information regarding battery in python.

Battery Stats A python3 module created for easily reading the current parameters of Battery in realtime. It reads battery stats from /sys/class/power_

A simple Python app that generates semi-random chord progressions.

chords-generator A simple Python app that generates semi-random chord progressions.

A simple and easy to use Spam Bot made in Python!

This is a simple spam bot made in python. You can use to to spam anyone with anything on any platform.

Runes - Simple Cookies You Can Extend (similar to Macaroons)

Runes - Simple Cookies You Can Extend (similar to Macaroons) is a paper called "Macaroons: Cookies with Context

Simple collection of GTPS Flood in Python.

GTPS Flood Simple collection of GTPS Flood in Python. NOTE Give me credit if you use this source, don't trade/sell this tool, And USE AT YOUR OWN RISK

Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method.

Astvuln Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method. Some search methods ar

A set of Python scripts to surpass human limits in accomplishing simple tasks.

Human benchmark fooler Summary A set of Python scripts with Selenium designed to surpass human limits in accomplishing simple tasks available on https

Releases(v0.0.1)
Owner
Ghiles Meddour
Data Analyst at Munic
Ghiles Meddour
Check username

Checker-Oukee Check username It checks the available usernames and creates a new account for them Doesn't need proxies Create a file with usernames an

4 Jun 05, 2022
Helper script to bootstrap a Python environment with the tools required to build and install packages.

python-bootstrap Helper script to bootstrap a Python environment with the tools required to build and install packages. Usage $ python -m bootstrap.bu

Filipe Laíns 7 Oct 06, 2022
Dice Rolling Simulator using Python-random

Dice Rolling Simulator As the name of the program suggests, we will be imitating a rolling dice. This is one of the interesting python projects and wi

PyLaboratory 1 Feb 02, 2022
The Black shade analyser and comparison tool.

diff-shades The Black shade analyser and comparison tool. AKA Richard's personal take at a better black-primer (by stealing ideas from mypy-primer) :p

Richard Si 10 Apr 29, 2022
A simple language and reference decompiler/compiler for MHW THK Files

Leviathon A simple language and reference decompiler/compiler for MHW THK Files. Project Goals The project aims to define a language specification for

11 Jan 07, 2023
A fast Python implementation of Ac Auto Mechine

A fast Python implementation of Ac Auto Mechine

Jin Zitian 1 Dec 07, 2021
A python package for your Kali Linux distro that find the fastest mirror and configure your apt to use that mirror

Kali Mirror Finder Using Single Python File A python package for your Kali Linux distro that find the fastest mirror and configure your apt to use tha

MrSingh 6 Dec 12, 2022
Python based utilities for interacting with digital multimeters that are built on the FS9721-LP3 chipset.

Python based utilities for interacting with digital multimeters that are built on the FS9721-LP3 chipset.

Fergus 1 Feb 02, 2022
Genart - Generate random art to sell as nfts

Genart - Generate random art to sell as nfts Usage git clone

Will 13 Mar 17, 2022
Grank is a feature-rich script that automatically grinds Dank Memer for you

Grank Inspired by this repository. This is a WIP and there will be more functions added in the future. What is Grank? Grank is a feature-rich script t

42 Jul 20, 2022
This repository contains some utilities for playing with PKINIT and certificates.

PKINIT tools This repository contains some utilities for playing with PKINIT and certificates. The tools are built on minikerberos and impacket. Accom

Dirk-jan 395 Dec 27, 2022
python package for generating typescript grpc-web stubs from protobuf files.

grpc-web-proto-compile NOTE: This package has been superseded by romnn/proto-compile, which provides the same functionality but offers a lot more flex

Roman Dahm 0 Sep 05, 2021
Conveniently measures the time of your loops, contexts and functions.

Conveniently measures the time of your loops, contexts and functions.

Maciej J Mikulski 79 Nov 15, 2022
A (very dirty) experiment to remove layers from a Docker image.

Surgically remove layers from a Docker image (with a chainsaw)

Jérôme Petazzoni 9 Jun 08, 2022
A script copies movie and TV files to your GD drive, or create Hard Link in a seperate dir, in Emby-happy struct.

torcp A script copies movie and TV files to your GD drive, or create Hard Link in a seperate dir, in Emby-happy struct. Usage: python3 torcp.py -h Exa

ccf2012 105 Dec 22, 2022
Exports the local variables into a global dictionary for later debugging.

PyExfiltrator Julia’s @exfiltrate for Python; Exports the local variables into a global dictionary for later debugging. Installation pip install pyexf

6 Nov 07, 2022
Michael Vinyard's utilities

Install vintools To download this package from pypi: pip install vintools Install the development package To download and install the developmen

Michael Vinyard 2 May 22, 2022
tade is a discussion/forum/link aggregator application. It provides three interfaces: a regular web page, a mailing list bridge and an NNTP server

tade is a discussion/forum/link aggregator application. It provides three interfaces: a regular web page, a mailing list bridge and an NNTP server

Manos Pitsidianakis 23 Nov 04, 2022
Python code to generate and store certificates automatically , using names from a csv file

WOC-certificate-generator Python code to generate and store certificates automatically , using names from a csv file IMPORTANT In order to make the co

Google Developer Student Club - IIIT Kalyani 10 May 26, 2022
Link-tree - Script that iterate over the links found in each page

link-tree Script that iterate over the links found in each page, recursively fin

Rodrigo Stramantinoli 2 Jan 05, 2022