This is the repo for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement".

Overview

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement

This is the repository for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement". The repository is structured as the following:

  • PyPruning: This repository contains the implementations for all pruning algorithms and can be installed as a regular python package and used in other projects. For more information have a look at the Readme file in PyPruning/Readme.md and its documentation in PyPruning/docs.
  • experiment_runner: This is a simple package / script which can be used to run multiple experiments in parallel on the same machine or distributed across many different machines. It can also be installed as a regular python package and used for other projects. For more information have a look at the Readme file in experiment_runner/Readme.md.
  • {adult, bank, connect, ..., wine-quality}: Each folder contains an script init.sh which downloads the necessary files and performs pre-processing if necessary (e.g. extract archives etc.).
  • init_all.sh: Iterates over all datasets and calls the respective init.sh files. Depending on your internet connection this may take some time
  • environment.yml: Anaconda environment file which contains all dependencies. For more details see below
  • LeafRefinement.py: This is the implementation of the LeafRefinement method. We initially implemented a more complex method which uses Proximal Gradient Descent to simultaneously learn the weights and refine leaf nodes. During our experiments we discovered that leaf-refinement in iteself was enough and much simpler. We kept our old code, but implemented the LeafRefinement.py class for easier usage.
  • run.py: The script which executes the experiments. For more details see the examples below.
  • plot_results.py: The script is used explore and display results. It also creates the plots for the paper.

Getting everything ready

This git repository contains two submodules PyPruning and experiment_runner which need to be cloned first.

git clone --recurse-submodules [email protected]:sbuschjaeger/leaf-refinement-experiments.git

After the code has been obtained you need to install all dependencies. If you use Anaconda you can simply call

conda env create -f environment.yml

to prepare and activate the environment LR. After that you can install the python packages PyPruning and experiment_runner via pip:

pip install -e file:PyPruning
pip install -e file:experiment_runner

and finally activate the environment with

conda activate LR

Last you will need to get some data. If you are interested in a specific dataset you can use the accompanying init.sh script via

cd `${Dataset}`
./init.sh

or if you want to download all datasets use

./init_all.sh

Depending on your internet connection this may take some time.

Running experiments

If everything worked as expected you should now be able to run the run.py script to prune some ensembles. This script has a decent amount of parameters. See further below for an minimal working example.

  • n_jobs: Number of jobs / threads used for multiprocessing
  • base: Base learner used for experiments. Can be {RandomForestClassifier, ExtraTreesClassifier, BaggingClassifier, HeterogenousForest}. Can be a list of arguments for multiple experiments.
  • nl: Maximum number of leaf nodes (corresponds to scikit-learns max_leaf_nodes parameter)
  • dataset: Dataset used for experiment. Can be a list of arguments for multiple experiments.
  • n_estimators: Number of estimators trained for the base learner.
  • n_prune: Size of the pruned ensemble. Can be a list of arguments for multiple experiments.
  • xval: Number of cross validation runs (default is 5)
  • use_prune: If set then the script uses a train / prune / test split. If not set then the training data is also used for pruning.
  • timeout: Maximum number of seconds per run. If the runtime exceeds the provided value, stop execution (default is 5400 seconds)

Note that all base ensembles for all cross validation splits of a dataset are trained before any of the pruning algorithms are used. If you want to evaluate many datasets / hyperparameter configuration in one run this requires a lot of memory.

To train and prune forests on the magic dataset you can for example do

./run.py --dataset adult -n_estimators 256 --n_prune 2 4 8 16 32 64 128 256 --nl 64 128 256 512 1024 --n_jobs 128 --xval 5 --base RandomForestClassifier

The results are stored in ${Dataset}/results/${base}/${use_prune}/${date}/results.jsonl where ${Dataset} is the dataset (e.g. magic) and ${date} is the current time and date.

In order to re-produce the experiments form the paper you can call:

./run.py --dataset adult anura bank chess connect eeg elec postures japanese-vowels magic mozilla mnist nomao avila ida2016 satimage --n_estimators 256 --n_prune 2 4 8 16 32 64 128 256 --nl 64 128 256 512 1024 --n_jobs 128 --xval 5 --base RandomForestClassifier

Important: This call uses 128 threads and requires a decent (something in the range of 64GB) amount of memory to work.

Exploring the results

After you run the experiments you can view the results with the plot_results.py script. We recommend to use an interactive Python environment for that such as Jupyter or VSCode with the ability to execute cells, but you should also be able to run this script as-is. This script is fairly well-commented, so please have a look at it for more detailed comments.

PyTorch Implementation of our paper Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

PyTorch Implementation of our paper Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

Zechen Bai 12 Jul 08, 2022
Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators

Pandas TA - A Technical Analysis Library in Python 3 Pandas Technical Analysis (Pandas TA) is an easy to use library that leverages the Pandas package

Kevin Johnson 3.2k Jan 09, 2023
Code and hyperparameters for the paper "Generative Adversarial Networks"

Generative Adversarial Networks This repository contains the code and hyperparameters for the paper: "Generative Adversarial Networks." Ian J. Goodfel

Ian Goodfellow 3.5k Jan 08, 2023
DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment

DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment This repository is related to the paper DEEPAGÉ: Answering Questions in Por

0 Dec 10, 2021
A Survey on Deep Learning Technique for Video Segmentation

A Survey on Deep Learning Technique for Video Segmentation A Survey on Deep Learning Technique for Video Segmentation Wenguan Wang, Tianfei Zhou, Fati

Tianfei Zhou 112 Dec 12, 2022
Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

kenro515 3 Jan 04, 2023
Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Instrument Recognition.

Music Trees Supplementary code for the experiments described in the 2021 ISMIR submission: Leveraging Hierarchical Structures for Few Shot Musical Ins

Hugo Flores García 32 Nov 22, 2022
Official Implementation of SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

Official Implementation of SimIPU SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations Since

Zhyever 37 Dec 01, 2022
Learning To Have An Ear For Face Super-Resolution

Learning To Have An Ear For Face Super-Resolution [Project Page] This repository contains demo code of our CVPR2020 paper. Training and evaluation on

50 Nov 16, 2022
A basic reminder tool written in Python.

A simple Python Reminder Here's a basic reminder tool written in Python that speaks to the user and sends a notification. Run pip3 install pyttsx3 w

Sachit Yadav 4 Feb 05, 2022
Position detection system of mobile robot in the warehouse enviroment

Autonomous-Forklift-System About | GUI | Tests | Starting | License | Author | 🎯 About An application that run the autonomous forklift paletization a

Kamil Goś 1 Nov 24, 2021
Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

One-Shot Free-View Neural Talking Head Synthesis Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Vide

ZLH 406 Dec 23, 2022
3D dataset of humans Manipulating Objects in-the-Wild (MOW)

MOW dataset [Website] This repository maintains our 3D dataset of humans Manipulating Objects in-the-Wild (MOW). The dataset contains 512 images in th

Zhe Cao 28 Nov 06, 2022
Code release of paper "Deep Multi-View Stereo gone wild"

Deep MVS gone wild Pytorch implementation of "Deep MVS gone wild" (Paper | website) This repository provides the code to reproduce the experiments of

François Darmon 53 Dec 24, 2022
This is a custom made virus code in python, using tkinter module.

skeleterrorBetaV0.1-Virus-code This is a custom made virus code in python, using tkinter module. This virus is not harmful to the computer, it only ma

AR 0 Nov 21, 2022
Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

Tengfei Wang 371 Dec 30, 2022
Hierarchical Metadata-Aware Document Categorization under Weak Supervision (WSDM'21)

Hierarchical Metadata-Aware Document Categorization under Weak Supervision This project provides a weakly supervised framework for hierarchical metada

Yu Zhang 53 Sep 17, 2022
Learn about quantum computing and algorithm on quantum computing

quantum_computing this repo contains everything i learn about quantum computing and algorithm on quantum computing what is aquantum computing quantum

arfy slowy 8 Dec 25, 2022
Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)

Distributed Deep Learning in Open Collaborations This repository contains the code for the NeurIPS 2021 paper "Distributed Deep Learning in Open Colla

Yandex Research 96 Sep 15, 2022
SVG Icon processing tool for C++

BAWR This is a tool to automate the icons generation from sets of svg files into fonts and atlases. The main purpose of this tool is to add it to the

Frank David Martínez M 66 Dec 14, 2022