QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions.

Overview

Qsynthesis

QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions. It aims at facilitating code deobfuscation. The algorithm is greybox approach combining both a blackbox I/O based synthesis and a whitebox AST search to synthesize sub-expressions (if the root node cannot be synthesized).

This algorithm as originaly been described at the BAR academic workshop:

The code has been release as part of the following Black Hat talk:

Disclaimer: This framework is experimental, and shall only be used for experimentation purposes. It mainly aims at stimulating research in this area.

Documentation

The installation, examples, and API documentation is available on the dedicated documentation: Documentation

Functionalities

The core synthesis is based on Triton symbolic engine on which is built the whole framework. It provides the following functionalities:

  • synthesis of bitvector expressions
  • ability to check through SMT the semantic equivalence of synthesized expressions
  • ability to synthesize constants (if the expression encode a constant)
  • ability to improve oracles (pre-computed tables) overtime through a learning mechanism
  • ability to reassemble synthesized expression back to assembly
  • ability to serve oracles through a REST API to facilitate the synthesis usage
  • an IDA plugin providing an integration of the synthesis

Quick start

Installation

In order to work Triton first has to be installed: install documentation. Triton does not automatically install itself in a virtualenv, copy it in your venv or use --system-site-packages when configuring your venv.

Then:

$ git clone [email protected]:synthesis/qsynthesis.git
$ cd qsynthesis
$ pip3 install '.[all]'

The [all] will installed all dependencies (see the documentation for a light install).

Table generation

The synthesis algorithm requires generating oracle tables derived from a grammar (a set of variables and operators). Qsynthesis installation provides the utility qsynthesis-table-manager enabling manipulating tables. The following command generate a table with 3 variables of 64 bits, 5 operators using a vector of 16 inputs. We limit the generation to 5 million entries.

$ qsynthesis-table-manager generate -bs 64 --var-num 3 --input-num 16 --random-level 5 --ops AND,NEG,MUL,XOR,NOT --watchdog 80 --limit 5000000 my_oracle_table
Generate Table
Watchdog value: 80.0
Depth 2 (size:3) (Time:0m0.23120s)
Depth 3 (size:21) (Time:0m0.23198s)
Depth 4 (size:574) (Time:0m0.26068s)
Depth 5 (size:400858) (Time:0m21.23231s)
Threshold reached, generation interrupted
Stop required
Depth 5 (size:5000002) (Time:4m52.56009s) [RAM:9.52Gb]

Note: The generation process is RAM consuming the --watchdog enables setting a percentage of the RAM above which the generation is interrupted.

Synthesizing a bitvector expression

We then can try simplifying a seemingly obfuscated expression with:

from qsynthesis import SimpleSymExec, TopDownSynthesizer, InputOutputOracleLevelDB

blob = b'UH\x89\xe5H\x89}\xf8H\x89u\xf0H\x89U\xe8H\x89M\xe0L\x89E\xd8H\x8bE' \
       b'\xe0H\xf7\xd0H\x0bE\xf8H\x89\xc2H\x8bE\xe0H\x01\xd0H\x8dH\x01H\x8b' \
       b'E\xf8H+E\xe8H\x8bU\xe8H\xf7\xd2H\x0bU\xf8H\x01\xd2H)\xd0H\x83\xe8' \
       b'\x02H!\xc1H\x8bE\xe0H\xf7\xd0H\x0bE\xf8H\x89\xc2H\x8bE\xe0H\x01\xd0' \
       b'H\x8dp\x01H\x8bE\xf8H+E\xe8H\x8bU\xe8H\xf7\xd2H\x0bU\xf8H\x01\xd2' \
       b'H)\xd0H\x83\xe8\x02H\t\xf0H)\xc1H\x89\xc8H\x83\xe8\x01]\xc3'

# Perform symbolic execution of the instructions
symexec = SimpleSymExec("x86_64")
symexec.initialize_register('rip', 0x40B160)  # arbitrary address
symexec.initialize_register('rsp', 0x800000)  # arbitrary stack
symexec.execute_blob(blob, 0x40B160)
rax = symexec.get_register_ast("rax")  # retrieve rax register expressions

# Load lookup tables
ltm = InputOutputOracleLevelDB.load("my_oracle_table")

# Perform Synthesis of the expression
synthesizer = TopDownSynthesizer(ltm)
synt_rax, simp = synthesizer.synthesize(rax)

print(f"expression: {rax.pp_str}")
print(f"synthesized expression: {synt_rax.pp_str} [{simp}]")

Limitations

  • synthesis accuracy limited by pre-computed tables exhaustivness
  • table generation limited by RAM consumption
  • reassembly cannot involve memory variable, destination is necessarily a register and architecture depends on llvmlite (thus mostly x86_64)
  • the code references trace-based synthesis which is disabled (as the underlying framework is not yet open-source)

Authors

  • Robin David (@RobinDavid), Quarkslab

Contributors

Huge thanks to contributors to this research:

  • Luigi Coniglio
  • Jonathan Salwan
You might also like...
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour. Currently compiles to URCL with plans to add additional formats in the future.

a BTC mining program based on python3

BTC-Miner a BTC mining program based on python3 Our project refers to the nightminer project by ricmoo, which is written in Python2 (https://github.co

texlive expressions for documents

tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation

Custom Python linting through AST expressions
Custom Python linting through AST expressions

bellybutton bellybutton is a customizable, easy-to-configure linting engine for Python. What is this good for? Tools like pylint and flake8 provide, o

Graphical Python debugger which lets you easily view the values of all evaluated expressions
Graphical Python debugger which lets you easily view the values of all evaluated expressions

birdseye birdseye is a Python debugger which records the values of expressions in a function call and lets you easily view them after the function exi

A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Turning SymPy expressions into PyTorch modules.

sympytorch A micro-library as a convenience for turning SymPy expressions into PyTorch Modules. All SymPy floats become trainable parameters. All SymP

Turning SymPy expressions into JAX functions

sympy2jax Turn SymPy expressions into parametrized, differentiable, vectorizable, JAX functions. All SymPy floats become trainable input parameters. S

The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.
The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.

Tensor Sensor See article Clarifying exceptions and visualizing tensor operations in deep learning code. One of the biggest challenges when writing co

A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Enable ++x and --x expressions in Python

By default, Python supports neither pre-increments (like ++x) nor post-increments (like x++). However, the first ones are syntactically correct since Python parses them as two subsequent +x operations, where + is the unary plus operator (same with --x and the unary minus). They both have no effect, since in practice -(-x) == +(+x) == x.

Karen is a Discord Bot that will check for a list of forbidden words/expressions, removing the message that contains them and replying with another message.

Karen is a Discord Bot that will check for a list of forbidden words/expressions, removing the message that contains them and replying with another message. Everything is highly customizable.

MacroTools provides a library of tools for working with Julia code and expressions.

MacroTools.jl MacroTools provides a library of tools for working with Julia code and expressions. This includes a powerful template-matching system an

A library for pattern matching on symbolic expressions in Python.
A library for pattern matching on symbolic expressions in Python.

MatchPy is a library for pattern matching on symbolic expressions in Python. Work in progress Installation MatchPy is available via PyPI, and

poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions

poetry2nix poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions. It does so by parsing pyproject.t

Comments
  • unrolling issue with PlaceHolderSynthesizer

    unrolling issue with PlaceHolderSynthesizer

    The PlaceHolderSynthesizer seem's not to properly replace placeholder variable during the synthesis process, on some use-cases. The resulting synthesized expression is thus completely wrong. I need to dig deeper.

    opened by RobinDavid 7
Owner
Quarkslab
Quarkslab
dotsend is a web application which helps you to upload your large files and share file via link

dotsend is a web application which helps you to upload your large files and share file via link

Devocoe 0 Dec 03, 2022
PyDeleter - delete a specifically formatted file in a directory or delete all other files

PyDeleter If you want to delete a specifically formatted file in a directory or delete all other files, PyDeleter does it for you. How to use? 1- Down

Amirabbas Motamedi 1 Jan 30, 2022
Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags.

Tiff Tools Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags. Developed by Kitware, Inc. with funding from The National Canc

Digital Slide Archive 32 Dec 14, 2022
Some-tasks - Files for some of the tasks for the group sessions

Files for some of the tasks for the group sessions Here you can find some of the

<a href=[email protected] Computer Networks"> 0 Aug 25, 2022
Python library for reading and writing tabular data via streams.

tabulator-py A library for reading and writing tabular data (csv/xls/json/etc). [Important Notice] We have released Frictionless Framework. This frame

Frictionless Data 231 Dec 09, 2022
PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

phithon 53 Nov 07, 2022
CredSweeper is a tool to detect credentials in any directories or files.

CredSweeper is a tool to detect credentials in any directories or files. CredSweeper could help users to detect unwanted exposure of credentials (such as personal information, token, passwords, api k

Samsung 54 Dec 13, 2022
Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

laojunjun 13 Nov 23, 2022
A simple tool to find and replace all the matches of a regular expression in file(s).

FindREp A simple tool to find and replace all the matches of a regular expression in file(s). You can either select the file(s) directly or select a f

Biraj 5 Oct 18, 2022
A Python script to organize your files in a given directory.

File-Organizer A Python script to organize your files in a given directory. It organizes your files based on the file extension and moves them into sp

Imira Randeniya 1 Sep 11, 2022
An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

Mike Orr 506 Dec 29, 2022
Python virtual filesystem for SQLite to read from and write to S3

Python virtual filesystem for SQLite to read from and write to S3

Department for International Trade 70 Jan 04, 2023
BREP : Binary Search in plaintext and gzip files

BREP : Binary Search in plaintext and gzip files Search large files in O(log n) time using binary search. We support plaintext and Gzipped files. Benc

Arnaud de Saint Meloir 5 Dec 24, 2021
Python module that parse power builder file (PBD) and analyze code

PowerBuilder-decompile Python module that parse power builder file (PBD) and analyze code (Incomplete) this tool is composed of: pbd_dump.py pbd file

Samy Sultan 8 Dec 15, 2022
A Certificate renaming tool made for IEEE CS SBC, SJCE.

PDF Batch Renamer Made for IEEE CS SBC, SJCE How to use? Before using the python script, ensure that pytesseract, pdf2image, opencv and other supporti

Ashwin Kumar U 2 Nov 14, 2021
Sheet Data Image/PDF-to-CSV Converter

Sheet Data Image/PDF-to-CSV Converter

Quy Truong 5 Nov 22, 2021
A python script generate password files in plain text

KeePass (or any desktop pw manager?) Helper WARNING: This script will generate password files in plain text. ITS NOT SECURE. I needed help remembering

Eric Thomas 1 Nov 21, 2021
Transforme rapidamente seu arquivo CSV (de qualquer tamanho) para SQL de forma rápida.

Transformador de CSV para SQL Transforme rapidamente seu arquivo CSV (de qualquer tamanho) para SQL de forma rápida, e com isso insira seus dados usan

William Rodrigues 4 Oct 17, 2022
Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

Jason R. Coombs 1k Dec 28, 2022
Provides a convenient way to append numpy arrays to a file.

Provides a convenient way to append numpy arrays to a file. The NpendWriter and NpendReader classes are used to write and read numpy arrays respective

3 May 14, 2022