🔬 Fixed struct serialization system, using Python 3.9 annotated type hints

Overview

py-struct

Fixed-size struct serialization, using Python 3.9 annotated type hints

This was originally uploaded as a Gist because it's not intended as a serious project, but I decided to take it a bit further and add some features.

Features:

  • One file, zero dependencies
  • Easy to use, just annotate your fields and use the decorator
  • Overridable (just define __load__, __save__, __size__, __align__)
  • Compatible with dataclasses
  • Integer / float primitives
  • Fixed size arrays
  • Raw chunks (bytes)
  • Static checking / size calculation
  • Packed or aligned structs, with 3 padding handling modes

Getting started

from .serialization import *
from io import BytesIO
from dataclasses import dataclass

# C ALIASES (for LP64)

Bool = U8
Char = S8; UChar = U8
Short = S16; UShort = U16
Int = S32; UInt = U32
Long = S64; ULong = U64
IntPtr = S64; UIntPtr = U64
Ptr = Size = U64

# Define some structs

@dataclass
class Foo(Struct):
    yeet: Bool
    ping: Bool

@dataclass
class MyStruct(Struct):
    foo: Foo
    bar: UInt
    three_bazs: tuple[Long, Long, Long]

# Decode with __load__(), passing an IO

data = b'\x01\x00\x00\x00\x00\x05\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
parsed = MyStruct.__load__(BytesIO(data))

assert parsed == MyStruct(foo=Foo(yeet=1, ping=0), bar=1280, three_bazs=(1, 2, 3))

# Encode back with __save__()

parsed.__save__(st := BytesIO())
assert data == st.getvalue()

Usage

Serializable protocol

A serializable class is one that implements the FixedSerializable protocol:

  • __load__(cls, st: BinaryIO) -> cls: class method that deserializes a readable binary stream into an instance
  • __save__(self, st: BinaryIO): instance method that serializes the instance into a writeable binary stream
  • __size__: int: class attribute that indicates total serialized size
  • __align__: int: align factor (1 if no alignment)

__size__ is expected to be a multiple of __align__ i.e. it includes any trailing alignment as needed.

class Color(NamedTuple):
    r: int
    g: int
    b: int

    __size__ = 3
    __align__ = 1
    @classmethod
    def __load__(cls, st):
        return cls(*st.read(3))
    def __save__(self, st):
        st.write(bytes(self))

Structs

Most times you'll just derive from Struct, which implements the serializable protocol for you, based on the property annotations of the class. As in C, fields are serialized in order of declaration.

@dataclass
class Color(Struct):
    r: U8
    g: U8
    b: U8

The implemented __load__ will construct an instance of the class passsing in keyword arguments according to the property annotations. In this example, the class will be constructed like Color(r=0, g=21, b=10). To avoid implementing the constructor and other methods yourself, Python's dataclass decorator can be used.

For the annotated properties, the following types are allowed:

  • A class implementing the serializable protocol, such as another struct.
  • One of the provided integer / float primitives: U8, S8, U16, S16, U32, S32, U64, S64, F32, F64. These are aliases of int with custom metadata consumed by Struct.
  • bytes, list[T] or tuple[T, ...] (where T is itself an allowed type).
  • A tuple with allowed types as elements. However, it must be annotated with FixedSize metadata like so: Annotated[list[U8], FixedSize(20)]

Struct is a metaclass, so it must be the first parent. Inheritace (subclassing the struct class, or more parents in addition to Struct) is discouraged and will probably not work correctly.

Alignment

Struct can automatically insert padding to align fields according to their __align__ attribute (or for primitives, their size). In this case, the struct itself is aligned to the LCM of the field alignments (and its __size__ is padded accordingly).

The align metaclass attribute controls how alignment is handled:

  • discard (default): When decoding, any bytes are accepted as padding (and discarded). When encoding, zeroes are inserted. (This is the only alignment mode that introduces malleability.)
  • zeros: Like discard, but only zeroes are accepted as padding when decoding.
  • explicit: Don't actually insert any padding, just check that all fields are aligned and that __size__ is aligned too. This mode expects you to explicitly declare padding as (for example) bytes.
  • no: No alignment at all. Field alignments are ignored and __align__ is set to 1. This is equivalent to a packed / unaligned struct.

For example, to create a packed struct:

@dataclass
class Address(Struct, align='no'):
    port: U16
    # without align='no', 2 bytes of padding would be inserted here
    host: U32

Caveat: tuple (last case in allowed types) will not verify or insert alignment between its elements. Its alignment will be the GCD of the alignments of its elements, and the size will be the sum of the sizes. This means tuple[U64, U64] will probably do what you want (align to 8 bytes), but tuple[U64, U32] will only align to 4 bytes. If you need alignment, use a nested Struct instead of a tuple.

Malleability

Serialization is non-malleable (that is, there's a bijection between serialized and unserialized values) if all the following conditions are met:

  • Structs use an alignment setting other than the default align='discard'.

Of course, if serialization is implemented manually at some point, malleability has to be checked there as well.

Wishlist

  • Bit fields
  • Post validation / transform (enums, booleans, string buffers, sets)
  • Endianness control
    • At annotation time, or at runtime?
  • Unions (? not clear how I'd implement those)
  • Optimization and laziness

Higher level:

  • Pointer newtype / wrapper class
    • Ideally annotated with target hint
    • Call to dereference, optionally passing index
  • Generics in struct classes
Owner
Alba Mendez
hi i'm a lesbian catgirl who loves hummus and kernels 🌸 she/her
Alba Mendez
Python Data Structures and Algorithms

No non-sense and no BS repo for how data structure code should be in Python - simple and elegant.

Prabhu Pant 1.9k Jan 08, 2023
A mutable set that remembers the order of its entries. One of Python's missing data types.

An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index number that can be looked up.

Elia Robyn Lake (Robyn Speer) 173 Nov 28, 2022
Final Project for Practical Python Programming and Algorithms for Data Analysis

Final Project for Practical Python Programming and Algorithms for Data Analysis (PHW2781L, Summer 2020) Redlining, Race-Exclusive Deed Restriction Lan

Aislyn Schalck 1 Jan 27, 2022
Data Structures and algorithms package implementation

Documentation Simple and Easy Package --This is package for enabling basic linear and non-linear data structures and algos-- Data Structures Array Sta

1 Oct 30, 2021
Svector (pronounced Swag-tor) provides extension methods to pyrsistent data structures

Svector Svector (pronounced Swag-tor) provides extension methods to pyrsistent data structures. Easily chain your methods confidently with tons of add

James Chua 5 Dec 09, 2022
Array is a functional mutable sequence inheriting from Python's built-in list.

funct.Array Array is a functional mutable sequence inheriting from Python's built-in list. Array provides 100+ higher-order methods and more functiona

182 Nov 21, 2022
Integrating C Buffer Data Into the instruction of `.text` segment instead of on `.data`, `.rodata` to avoid copy.

gcc-bufdata-integrating2text Integrating C Buffer Data Into the instruction of .text segment instead of on .data, .rodata to avoid copy. Usage In your

Jack Ren 1 Jan 31, 2022
Python collections that are backended by sqlite3 DB and are compatible with the built-in collections

sqlitecollections Python collections that are backended by sqlite3 DB and are compatible with the built-in collections Installation $ pip install git+

Takeshi OSOEKAWA 11 Feb 03, 2022
Data Structure With Python

Data-Structure-With-Python- Python programs also include in this repo Stack A stack is a linear data structure that stores items in a Last-In/First-Ou

Sumit Nautiyal 2 Jan 09, 2022
Al-Quran dengan Terjemahan Indonesia

Al-Quran Rofi Al-Quran dengan Terjemahan / Tafsir Jalalayn Instalasi Al-Quran Rofi untuk Archlinux untuk pengguna distro Archlinux dengan paket manage

Nestero 4 Dec 20, 2021
Map single-cell transcriptomes to copy number evolutionary trees.

Map single-cell transcriptomes to copy number evolutionary trees. Check out the tutorial for more information. Installation $ pip install scatrex SCA

Computational Biology Group (CBG) 12 Jan 01, 2023
Python library for doing things with Grid-like structures

gridthings Python library for doing things with Grid-like structures Development This project uses poetry for dependency management, pre-commit for li

Matt Kafonek 2 Dec 21, 2021
My solutions to the competitive programming problems on LeetCode, USACO, LintCode, etc.

This repository holds my solutions to the competitive programming problems on LeetCode, USACO, LintCode, CCC, UVa, SPOJ, and Codeforces. The LeetCode

Yu Shen 32 Sep 17, 2022
Leetcode solutions - All algorithms implemented in Python 3 (for education)

Leetcode solutions - All algorithms implemented in Python 3 (for education)

Vineet Dhaimodker 3 Oct 21, 2022
Simple spill-to-disk dictionary

Chest A dictionary that spills to disk. Chest acts likes a dictionary but it can write its contents to disk. This is useful in the following two occas

Blaze 59 Dec 19, 2022
Solutions for leetcode problems.

Leetcode-solution This is an repository for storring new algorithms that I am learning form the LeetCode for future use. Implemetations Two Sum (pytho

Shrutika Borkute 1 Jan 09, 2022
RLStructures is a library to facilitate the implementation of new reinforcement learning algorithms.

RLStructures is a lightweight Python library that provides simple APIs as well as data structures that make as few assumptions as possibl

Facebook Research 262 Nov 18, 2022
A DSA repository but everything is in python.

DSA Status Contents A: Mathematics B: Bit Magic C: Recursion D: Arrays E: Searching F: Sorting G: Matrix H: Hashing I: String J: Linked List K: Stack

Shubhashish Dixit 63 Dec 23, 2022
This repository contains code for CTF platform.

CTF-platform Repository for backend of CTF hosting website For starting the project first time : Clone the repo in which you have to work in your syst

Yash Jain 3 Feb 18, 2022
This Repository consists of my solutions in Python 3 to various problems in Data Structures and Algorithms

Problems and it's solutions. Problem solving, a great Speed comes with a good Accuracy. The more Accurate you can write code, the more Speed you will

SAMIR PAUL 1.3k Jan 01, 2023