FindFunc is an IDA PRO plugin to find code functions that contain a certain assembly or byte pattern, reference a certain name or string, or conform to various other constraints.

Related tags

Deep LearningFindFunc
Overview

FindFunc: Advanced Filtering/Finding of Functions in IDA Pro

FindFunc is an IDA Pro plugin to find code functions that contain a certain assembly or byte pattern, reference a certain name or string, or conform to various other constraints. This is not a competitor to tools like Diaphora or BinNavi, but it is ideal to find a known function in a new binary for cases where classical bindiffing fails.

ffmain

Filtering with Rules

The main functionality of FindFunc is letting the user specify a set of "Rules" or constraints that a code function in IDA Pro has to satisfy. FF will then find and list all functions that satisfy ALL rules (so currently all Rules are in an AND-conjunction). Exception: Rules can be "inverted" to be negative matches. Such rules thus conform to "AND NOT".

FF will schedule the rules in a smart order to minimize processing time. Feature overview:

  • Currently 6 Rules available, see below
  • Code matching respects Addressing-Size-Prefix and Operand-Size-Prefix
  • Aware of function chunks
  • Smart scheduling of rules for performance
  • Saving/Loading rules from/to file in simple ascii format
  • Several independent Tabs for experimentation
  • Copying rules between Tabs via clipboard (same format as file format)
  • Saving entire session (all tabs) to file
  • Advanced copying of instruction bytes (all, opcodes only, all except immediates)

Button "Search Functions" clears existing results and starts a fresh search, "Refine Results" considers only results of the previous search.

Advanced Binary Copying

A secondary feature of FF is the option to copy binary representation of instructions with the following options:

  • copy all -> copy all bytes to the clipboard
  • copy without immediates -> blank out (AA ?? BB) any immediate values in the instruction bytes
  • opcode only -> will blank out everything except the actual opcode(s) of the instruction (and prefixes)

See "advanced copying" section below for details. This feature nicely complements the Byte Pattern rule!

Building and Installation

FindFunc is an IDA Pro python plugin without external package dependencies. It can be installed by downloading the repository and copying file findfuncmain.py and folder findfunc to your IDA Pro plugin directory. No building is required.

Requirements: IDA Pro 7.x (7.6+) with python3 environment. FindFunc is designed for x86/x64 architecture only. It has been tested with IDA 7.6/7.7, python 3.9 and IDAPython 7.4.0 on Windows 10.

Available Rules

Currently the following six rules are available. They are sorted here from heavy to light with regard to performance impact. With large databases it is a good idea to first cut down the candidate-functions with a cheap rule, before doing heavy matching via e.g. Code Rules. FF will automatically schedule rules in a smart way.

Code Pattern

Rule for filtering function based on them containing a given assembly code snippet. This is NOT a text-search for IDAs textual disassembly representation, but rather performs advanced matching of the underlying instruction. The snippet may contain many consecutive instructions, one per line. Function chunks are supported. Supports special wildcard matching, in addition to literal assembly:

  • "pass" -> matches any instruction with any operands
  • "mov* any,any" -> matches instructions with mnemonic "mov*" (e.g. mov, movzx, ...) and any two arguments.
  • "mov eax, r32" -> matches any instruction with mnemonic "mov", first operand register eax and second operand any 32-bit register.
    • Analogue: r for any register, r8/r16/r32/r64 for register of a specific width, "imm" for any immediate
  • "mov r64, imm" -> matches any move of a constant to a 64bit register
  • "any r64,r64" -> matches any operation between two 64bit registers
  • mov -> matches any instruction of mov mnemonic

more examples:

mov r64, [r32 * 8 + 0x100]
mov r, [r * 8 - 0x100]
mov r64, [r32 * 8 + imm]
pass
mov r, word [eax + r32 * 8 - 0x100]
any r64, r64
push imm
push any

Gotchas: Be careful when copying over assembly from IDA. IDA mingles local variable names and other information into the instruction which leads to matching failure. Also, labels are not supported ("call sub_123456").

Note that Code Patterns is the most expensive Rule, and if only Code Rules are present FF has no option but to disassemble the entire database. This can take up to several minutes for very large binaries. See notes on performance below.

Immediate Value (Constant)

The function must contain the given immediate at least once in any position. An immediate value is a value fixed in the binary representation of the instruction. Examples for instructions matching immediate value 0x100:

mov eax, 0x100
mov eax, [0x100]
and al, [eax + ebx*8 + 0x100]
push 0x100

Note: IDA performs extensive matching of any size and any position of the immediate. If you know it to be of a specific width of 4 or 8 bytes, a byte pattern can be a little faster.

Byte Pattern

The function must contain the given byte pattern at least once. The pattern is of the same format as IDAs binary search, and thus supports wildcards - the perfect match for the advanced-copy feature!

Examples:

11 22 33 44 aa bb cc
11 22 33 ?? ?? bb cc -> ?? can be any byte

Note: Pattern matching is quiet fast and a good candidate to cut down matches quickly!

String Reference

The function must reference the given string at least once. The string is matched according to pythons 'fnmatch' module, and thus supports wildcard-like matching. Matching is performed case-insensitive. Strings of the following formats are considered: [idaapi.STRTYPE_C, idaapi.STRTYPE_C_16] (this can be changed in the Config class).

Examples:

  • "TestString" -> function must reference the exact string (casing ignored) at least once
  • "TestStr*" -> function must reference a string starting with 'TestStr (e.g. TestString, TestStrong) at least once (casing ignored)

Note: String matching is fast and a good choice to cut down candidates quickly!

Name Reference

The function must reference the given name/label at least once. The name/label is matched according to pythons 'fnmatch' module, and thus supports wildcard-like matching. Matching is performed case-insensitive.

Examples:

  • "memset" -> function must reference a location named "memset" at least once
  • "mem*" -> function must reference a location starting with "mem" (memset, memcpy, memcmp) at least once

Note: Name matching is very fast and ideal to cut down candidates quickly!

Function Size

The size of the function must be within the given limit: "min <= functionsize <= max". Data is entered as a string of the form "min,max". The size of a function includes all of its chunks.

Note: Function size matching is very fast and ideal to cut down candidates quickly!

Keyboard Shortcuts & GUI

For ease of use FF can be used via the following keyboard shortcuts:

  • Ctrl+Alt+F -> launch/show TabWidget (main GUI)
    • Or View->FindFunc
  • Ctrl+F -> start search with currently enabled rules
  • Ctrl+R -> refine existing results with currently enabled rules
  • Rules
    • Ctrl+C -> copy selected rules to clipboard
    • Ctrl+V -> paste rules from clipboard into current tab (appends)
    • Ctrl+S -> save selected rules to file
    • Ctrl+L -> load selected rules from file (appends)
    • Ctrl+A -> select all rules
    • Del -> delete selected rules
  • Save Session
    • Ctrl+Shift+S -> Save session to file
    • Ctrl+Shift+L -> Load session from file

Further GUI usage

  • Rules can be edited by double-clicking the Data column
  • Rules can be inverted (negative match) by double-clicking the invert-match column
  • Rules can be enabled/disabled by double-clicking the enabled-column
  • Tabs can be renamed by double-clicking them
  • Sorting is supported both for Rule-List and Result-List
  • Double-click Result item to jump to it in IDA
    • function name: jump to function start
    • any other column: jump to match of last matched rule
  • Checkbox Profile: Outputs profiling information for the search
  • Checkbox Debug: Dumps detailed debugging output for code rule matching - only use it if few functions make it to the code checking rule, otherwise it might take very long!

Advanced Binary Copy

Frequently we want to search for binary patterns of assembly, but without hardcoded addresses and values (immediates), or even only the actual opcodes of the instruction. FindFunc makes this easy by adding three copy options to the disassembly-popupmenu:

Copy all bytes

Copies all instruction bytes as hex-string to clipboard, for use in a Byte-Pattern-Rule (or IDAs binary search).

B8 44332211      mov eax,11223344
68 00000001      push 1000000
66:894424 70     mov word ptr ss:[esp+70],ax

will be copied as

b8 44 33 22 11 68 00 00 00 01 66 89 44 24 70

Copy only non-immediate bytes

Copies instruction bytes for given instruction, masking out any immediate values. Example:

B8 44332211      mov eax,11223344
68 00000001      push 1000000
66:894424 70     mov word ptr ss:[esp+70],ax

will be copied as

b8 ?? ?? ?? ?? 68 ?? ?? ?? ?? 66 89 44 24 ??

Copy only opcodes

Copy all instruction bytes as hex-string to clipboard, masking out any bytes that are not the actual opcode (including sib, modrm, but keeping legacy prefixes).

B8 44332211      mov eax,11223344
68 00000001      push 1000000
66:894424 70     mov word ptr ss:[esp+70],ax

will be copied as

b8 ?? ?? ?? ?? 68 ?? ?? ?? ?? 66 89 ?? ?? ??

Note: This is a "best effort" using IDAs API, thus there may be few cases where it only works partially. For a 100% correct solution we would have to ship a dedicated x86 disasm library.

Similar results can be achieved with Code Pattern Rules, but this might be faster, both for user interaction and the actual search.

Copy disasm

Copies selected disassembly to clipboard, as it appears in IDA.

Performance

A brief word on performance:

  1. name, string, funcsize are almost free in all cases
  2. bytepattern is almost free for patterns length > 2
  3. immediate is difficult: We can use idaapi search, or we can disassemble the entire database and search ourselves - we may have to do this anyways if we are looking for code patterns. BUT: scanning for code patterns is in fact much cheaper than scanning for an immediate. An api-search for all matches is relatively costly - about 1/8 as costly as disassembling the entire database. So: If we cut down matches with cheap rules first, then we greatly profit from disassembling the remaining functions and looking for the immediate ourselves, especially if a code-rule is present anyways. However: If no cheap options exist and we have to disassemble large parts of the database anyways (due to presence of code pattern rules), then using one immediate rule as a pre-filter can greatly pay off. api-searching ONE immediate is roughly equivalent to 1/8 searching for any number of code-pattern rules - although this also depends on many different factors...
  4. code pattern are the most expensive by far, however checking one pattern vs checking many is very similar.

Todo (unordered):

  • jcc pseudo-mnemonic
  • Allow named locations in CodeRules ('call memset')
  • 'ignore all following operands' option
  • Rule for parameters to API calls inside function
  • Rule for parent/callsite/child function requirements
  • Rule for function parameters
  • Regex-rule
  • string/name: casing option
  • automatically convert immediate rules to byte pattern if applicable?
  • settings: case sensitivity, string types, range, ...
  • Hexray rules?
  • OR combination of rules
  • Pythonification of code ;)
  • Parallelization
  • Automatic generation of rules to identify a function?
You might also like...
PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)
PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)

Hand Biomechanical Constraints Pytorch Unofficial PyTorch reimplementation of Hand-Biomechanical-Constraints (ECCV2020). This project reimplement foll

Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Angora Angora is a mutation-based coverage guided fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without s

Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification T M Feroz Ali, Subhasis Chaudhuri, ICVGIP-20-21

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

New AidForBlind - Various Libraries used like OpenCV and other mentioned in Requirements.txt

AidForBlind Recommended PyCharm IDE Various Libraries used like OpenCV and other

minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) impleme

IDA file loader for UF2, created for the DEFCON 29 hardware badge

UF2 Loader for IDA The DEFCON 29 badge uses the UF2 bootloader, which conveniently allows you to dump and flash the firmware over USB as a mass storag

Code for the paper
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

A Python script that creates subtitles of a given length from text paragraphs that can be easily imported into any Video Editing software such as FinalCut Pro for further adjustments.
A Python script that creates subtitles of a given length from text paragraphs that can be easily imported into any Video Editing software such as FinalCut Pro for further adjustments.

Text to Subtitles - Python This python file creates subtitles of a given length from text paragraphs that can be easily imported into any Video Editin

Releases(v1.4)
Deep Ensemble Learning with Jet-Like architecture

Ransomware analysis using DEL with jet-like architecture comprising two CNN wings, a sparse AE tail, a non-linear PCA to produce a diverse feature space, and an MLP nose

Ahsen Nazir 2 Feb 06, 2022
Using NumPy to solve the equations of fluid mechanics together with Finite Differences, explicit time stepping and Chorin's Projection methods

Computational Fluid Dynamics in Python Using NumPy to solve the equations of fluid mechanics 🌊 🌊 🌊 together with Finite Differences, explicit time

Felix Köhler 4 Nov 12, 2022
RDA: Robust Domain Adaptation via Fourier Adversarial Attacking

RDA: Robust Domain Adaptation via Fourier Adversarial Attacking Updates 08/2021: check out our domain adaptation for video segmentation paper Domain A

17 Nov 30, 2022
Jupyter notebooks for using & learning Keras

deep-learning-with-keras-notebooks 這個github的repository主要是個人在學習Keras的一些記錄及練習。希望在學習過程中發現到一些好的資訊與範例也可以對想要學習使用 Keras來解決問題的同好,或是對深度學習有興趣的在學學生可以有一些方便理解與上手範例

ErhWen Kuo 2.1k Dec 27, 2022
Source code for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning Official implementation of ACC, described in the paper "Adaptively Calibrated C

3 Sep 16, 2022
A Simple and Versatile Framework for Object Detection and Instance Recognition

SimpleDet - A Simple and Versatile Framework for Object Detection and Instance Recognition Major Features FP16 training for memory saving and up to 2.

TuSimple 3k Dec 12, 2022
QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing

QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing Environment Tested on Ubuntu 14.04 64bit and 16.04 64bit Installation # disabl

gts3.org (<a href=[email protected])"> 581 Dec 30, 2022
A simple code to convert image format and channel as well as resizing and renaming multiple images.

Rename-Resize-and-convert-multiple-images A simple code to convert image format and channel as well as resizing and renaming multiple images. This cod

Happy N. Monday 3 Feb 15, 2022
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

19 Oct 27, 2022
The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning Preprocess file of the dataset used in implicit sub-populations: (Demographic groups

<a href=[email protected]"> 4 Oct 14, 2022
Ağ tarayıcı.Gönderdiği paketler ile ağa bağlı olan cihazların IP adreslerini gösterir.

NetScanner.py Ağ tarayıcı.Gönderdiği paketler ile ağa bağlı olan cihazların IP adreslerini gösterir. Linux'da Kullanımı: git clone https://github.com/

4 Aug 23, 2021
DTCN IJCAI - Sequential prediction learning framework and algorithm

DTCN This is the implementation of our paper "Sequential Prediction of Social Me

Bobby 2 Jan 24, 2022
Pytorch GUI(demo) for iVOS(interactive VOS) and GIS (Guided iVOS)

GUI for iVOS(interactive VOS) and GIS (Guided iVOS) GUI Implementation of CVPR2021 paper "Guided Interactive Video Object Segmentation Using Reliabili

Yuk Heo 13 Dec 09, 2022
code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

G-SFDA Code (based on pytorch 1.3) for our ICCV 2021 paper 'Generalized Source-free Domain Adaptation'. [project] [paper]. Dataset preparing Download

Shiqi Yang 84 Dec 26, 2022
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Rishikesh (ऋषिकेश) 31 Dec 08, 2022
Code for Efficient Visual Pretraining with Contrastive Detection

Code for DetCon This repository contains code for the ICCV 2021 paper "Efficient Visual Pretraining with Contrastive Detection" by Olivier J. Hénaff,

DeepMind 56 Nov 13, 2022
PyTorch framework for Deep Learning research and development.

Accelerated DL & RL PyTorch framework for Deep Learning research and development. It was developed with a focus on reproducibility, fast experimentati

Catalyst-Team 29 Jul 13, 2022
Implementation of the ALPHAMEPOL algorithm, presented in Unsupervised Reinforcement Learning in Multiple Environments.

ALPHAMEPOL This repository contains the implementation of the ALPHAMEPOL algorithm, presented in Unsupervised Reinforcement Learning in Multiple Envir

3 Dec 23, 2021
Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation

FCN_MSCOCO_Food_Segmentation Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation Input data: [http://mscoco.org/dataset/#ove

Alexander Kalinovsky 11 Jan 08, 2019
The code of NeurIPS 2021 paper "Scalable Rule-Based Representation Learning for Interpretable Classification".

Rule-based Representation Learner This is a PyTorch implementation of Rule-based Representation Learner (RRL) as described in NeurIPS 2021 paper: Scal

Zhuo Wang 53 Dec 17, 2022