A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

Overview

MCILBoost

Project | CVPR Paper | MIA Paper
Contact: Jun-Yan Zhu (junyanz at cs dot cmu dot edu)

Overview

This is the authors' implementation of MCIL-Boost method described in:
[1] Multiple Clustered Instance Learning for Histopathology Cancer Image Segmentation, Clustering, and Classification.
Yan Xu*, Jun-Yan Zhu*, Eric Chang, and Zhuowen Tu (*equal contribution)
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

[2] Weakly Supervised Histopathology Cancer Image Segmentation and Classification
Yan Xu, Jun-Yan Zhu, Eric I-Chao Chang, Maode Lai, and Zhuowen Tu
In Medical Image Analysis, 2014.

Please cite our papers if you use our code for your research.

This package consists of the following two multiple-instance learning (MIL) methods:

  • MIL-Boost [Viola et al. 2006]: set c = 1
  • MCIL-Boost [1] [2]: set c > 1

The core of this package is a command-line interface written in C++. Various Matlab helper functions are provided to help users easily train/test MCIL-Boost model, perform cross-validation, and evaluate the performance.

System Requirement

  • Linux and Windows.
  • For Linux, the code is compiled by gcc 4.8.2 under Ubuntu 14.04.

Installation

  • Download and unzip the code.
    • For Linux users, type "chmod +x MCILBoost".
  • Open Matlab and run "demoToy.m".
  • To use the command-line interface, see "Command Usage".
  • To use Matlab functions, see "Matlab helper functions"; You can modify "SetParamsToy.m" and "demoToy.m" to run your own experiments.

Quick Examples

(Windows: MCILBoost.exe; Linux: ./MCILBoost)
An example for training:
MCILBoost.exe -v 2 -t 0 -c 2 -n 150 -s 0 -r 20 toy.data toy.model
An example for testing:
MCILBoost.exe -v 2 -t 1 -c 2 toy.data toy.model toy.result

Command Usage ([ ]: options)

MCILBoost.exe [-v verbose] [-t mode] [-c #clusters] [-n #weakClfs] [-s softmax] data_file model_file [result_file] (No need to specifiy c, n, s, r for test as the program will copy these parameters from the model_file)

-v verbose: shows details about the runtime output (default = 1) 0 -- no output 1 -- some output 2 -- more output

-t mode: set the training mode (default=0) 0 -- train a model 1 -- test a model

-c #clusters: set the number of clusters in positive bags (default = 1) c = 1 -- train a MIL-Boost model c > 1 -- train a MCIL-Boost model with multiple clusters

-n #weakClfs: set the maximum number of weak classifiers (default = 150)

-s softmax: set the softmax type: (default s = 0) 0 -- GM 1 -- LSE

-r exponent: set the exponent used in GM and LSE (default r = 20)

data_file: set the path for input data.

model_file: set the path for the model file.

result_file: set the path for result file. If result_file is not specified, result_file = data_file + '.result'

Matlab helper functions

  • MCILBoost.m: main entry function: model training/testing, and cross-validation.
  • SetParams.m: Set parameters for MCILBoost.m. You need to modify this file to run your own experiment.
  • TrainModel.m: train a model, call MCIL-Boost command line.
  • TestModel.m: test a model, call MCIL-Boost command line.
  • CrossValidate.m: split the data into n-fold, perform n-fold cross-validation, and report performance.
  • ReadData.m: read Matlab data from a text file.
  • WriteData.m: write Matlab data to a text file.
  • ReadResult.m: read Matlab result data from a text file.
  • MeasureResult.m: evaluate performance in terms of accuracy and auc (area under the curve).
  • AUC: compute the area under ROC curve given prediction and ground truth labels.
  • demoToy.m: demo script for toy data.
  • SetParamsToy.m: set parameters for demoToy.
  • demo1.m: demo script for Fox, Tiger, Elephant experiment.
  • SetParamsDemo1.m: set parameters for demo1.
  • demo2.m: demo script for SIVAL experiment.
  • SetParamsDemo2.m: set parameters for demo2.

Summary of Benchmark Results

  • I provide two scripts for running experiments on publicly available MIL benchmarks.
    • "demo1.m": experiments on Fox, Tiger, Elephant dataset.
      The MIL-Boost achieved 0.61 (Fox), 0.81 (Tiger), 0.82 (Elephant) on 10-fold cross-validation over 10 runs.
    • "demo2.m": experiments on SIVAL dataset. There are 180 positive bags (3 clusters), and 180 negative bags. While multiple clusters appear in positive bags, MCIL-Boost works better than MIL-Boost does.
      MIL-Boost (c=1): mean_acc = 0.742, mean_auc = 0.824
      MCIL-Boost (c=3): mean_acc = 0.879, mean_auc = 0.944
  • Note: See "demo1.m" and "demo2.m" for details.

Input Format

  • Note: You can use Matlab function "ReadData.m" and "WriteData.m" to read/write Matlab data from/to the text file.
  • Description: the input format is similar to the format used in LIBSVM and MILL package. The software also supports a sparse format. In the first line, you first need to specify the number of all instances, and the number of feature dimensions. Each line represents one instance, which has an instance id, bag id, and the label id (>= 1 for positive bags, and 0 for negative bags). Each feature value is represented as a : pair where is the index of the feature (starting from 1)
  • Format:
    : : : : ...
    : : : : ...
  • Example: A toy example that contains two negative bags and two positive bags. (see "toy.data") The negative instance is always (0, 0, 0) while there are two clusters of positive instances (0, 1, 0) and (0, 0, 1)
    8 3
    0:0:0 1:0 2:0 3:0
    1:0:0 1:0 2:0 3:0
    2:1:0 1:0 2:0 3:0
    3:1:0 1:0 2:0 3:0
    4:2:1 1:0 2:1 3:0
    5:2:1 1:0 2:0 3:0
    6:3:1 1:0 2:0 3:1
    7:3:1 1:0 2:0 3:0

Output Format

  • Note: You can use Matlab function "ReadResult.m" to load the Matlab data from the result file.

  • Description: The software outputs four kinds of predictions (see more details in the paper):

    • overall bag-level prediction p_i (the probability of the bag x_i being positive bag)
    • cluster-wise bag-level prediction p_i^k (the probability of the bag x_i belonging to k-th cluster)
    • overall instance-level prediction p_{ij} (the probability of the instance x_{ij} being positive instance)
    • cluster-wise instance-level prediction p_{ij}^k (the probability of the instance x_{ij} belonging to the k-th cluster)
    • In the first line, the software outputs the number of bags, and the number of clusters. Then for each bag, the software outputs the bag-level information and prediction (bag id, number of instances, ground truth label, number of clusters, and p_i).The software also outputs the bag-level prediction for each cluster (cluster id and prediction p_i^k for each cluster). Then for each instance, the software outputs the instance-level prediction (instance id and prediction p_{ij}) and instance-level prediction for each cluster (cluster_id and prediction p_{ij}^k)
  • Format:
    #bag= #cluster=
    bag_id= #insts= label= #cluster= pred=
    cluster_id= pred= cluster_id= pred= ...
    inst_id= pred= cluster_id= pred= cluster_id= pred= inst_id= pred= cluster_id= pred= cluster_id= pred= ...
    ...

  • Example: The output of the toy example:
    #bags=4 #clusters=2
    bag_id=0 #insts=2 label=0 #clusters=2 pred=0
    cluster_id=0 pred=0 cluster_id=1 pred=0
    inst_id=0 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0
    inst_id=1 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0
    bag_id=1 #insts=2 label=0 #clusters=2 pred=0
    cluster_id=0 pred=0 cluster_id=1 pred=0
    inst_id=0 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0
    inst_id=1 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0
    bag_id=2 #insts=2 label=1 #clusters=2 pred=1
    cluster_id=0 pred=1 cluster_id=1 pred=0
    inst_id=0 pred=1 cluster_id=0 pred=1 cluster_id=1 pred=0
    inst_id=1 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0
    bag_id=3 #insts=2 label=1 #clusters=2 pred=1
    cluster_id=0 pred=0 cluster_id=1 pred=1
    inst_id=0 pred=1 cluster_id=0 pred=0 cluster_id=1 pred=1
    inst_id=1 pred=0 cluster_id=0 pred=0 cluster_id=1 pred=0

    Credit

    Part of this code is based on the work by Piotr Dollar and Boris Babenko.

Owner
Jun-Yan Zhu
Understanding and creating pixels.
Jun-Yan Zhu
Chainer Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

fcn - Fully Convolutional Networks Chainer implementation of Fully Convolutional Networks. Installation pip install fcn Inference Inference is done as

Kentaro Wada 218 Oct 27, 2022
LabelImg is a graphical image annotation tool.

LabelImgPlus LabelImg is a graphical image annotation tool. This project is not updated with new functions now. More functions are supported with Labe

lzx1413 200 Dec 20, 2022
Efficient Lottery Ticket Finding: Less Data is More

The lottery ticket hypothesis (LTH) reveals the existence of winning tickets (sparse but critical subnetworks) for dense networks, that can be trained in isolation from random initialization to match

VITA 20 Sep 04, 2022
Official implement of Paper:A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images 深度监督影像融合网络DSIFN用于高分辨率双时相遥感影像变化检测 Of

Chenxiao Zhang 135 Dec 19, 2022
Unbiased Learning To Rank Algorithms (ULTRA)

This is an Unbiased Learning To Rank Algorithms (ULTRA) toolbox, which provides a codebase for experiments and research on learning to rank with human annotated or noisy labels.

71 Dec 01, 2022
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Sun Yi 201 Nov 21, 2022
Automatic differentiation with weighted finite-state transducers.

GTN: Automatic Differentiation with WFSTs Quickstart | Installation | Documentation What is GTN? GTN is a framework for automatic differentiation with

100 Dec 29, 2022
Continuous Conditional Random Field Convolution for Point Cloud Segmentation

CRFConv This repository is the implementation of "Continuous Conditional Random Field Convolution for Point Cloud Segmentation" 1. Setup 1) Building c

Fei Yang 8 Dec 08, 2022
Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment The official implementation of Arch-Net: Model Distillation for Architecture A

MEGVII Research 22 Jan 05, 2023
An End-to-End Machine Learning Library to Optimize AUC (AUROC, AUPRC).

Logo by Zhuoning Yuan LibAUC: A Machine Learning Library for AUC Optimization Website | Updates | Installation | Tutorial | Research | Github LibAUC a

Optimization for AI 176 Jan 07, 2023
🔊 Audio and fastai v2

Fastaudio An audio module for fastai v2. We want to help you build audio machine learning applications while minimizing the need for audio domain expe

152 Dec 28, 2022
StyleSwin: Transformer-based GAN for High-resolution Image Generation

StyleSwin This repo is the official implementation of "StyleSwin: Transformer-based GAN for High-resolution Image Generation". By Bowen Zhang, Shuyang

Microsoft 349 Dec 28, 2022
A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data Overview Clustering analysis is widely utilized in single-cell RNA-seque

AI-Biomed @NSCC-gz 3 May 08, 2022
Deep learning with TensorFlow and earth observation data.

Deep Learning with TensorFlow and EO Data Complete file set for Jupyter Book Autor: Development Seed Date: 04 October 2021 ISBN: (to come) Notebook tu

Development Seed 20 Nov 16, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 02, 2023
Code for unmixing audio signals in four different stems "drums, bass, vocals, others". The code is adapted from "Jukebox: A Generative Model for Music"

Status: Archive (code is provided as-is, no updates expected) Disclaimer This code is a based on "Jukebox: A Generative Model for Music" Paper We adju

Wadhah Zai El Amri 24 Dec 29, 2022
OpenMMLab Computer Vision Foundation

English | 简体中文 Introduction MMCV is a foundational library for computer vision research and supports many research projects as below: MMCV: OpenMMLab

OpenMMLab 4.6k Jan 09, 2023
nn_builder lets you build neural networks with less boilerplate code

nn_builder lets you build neural networks with less boilerplate code. You specify the type of network you want and it builds it. Install pip install n

Petros Christodoulou 157 Nov 20, 2022
PyTorch implementation of Off-policy Learning in Two-stage Recommender Systems

Off-Policy-2-Stage This repo provides a PyTorch implementation of the MovieLens experiments for the following paper: Off-policy Learning in Two-stage

Jiaqi Ma 25 Dec 12, 2022
A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.

Adversarial Video Generation This project implements a generative adversarial network to predict future frames of video, as detailed in "Deep Multi-Sc

Matt Cooper 704 Nov 26, 2022