This is an example of a reproducible modelling project

Last update: Oct 26, 2021

Related tags

Overview

An example of a reproducible modelling project

What are we doing?

This example was created for the 2021 fall lecture series of Stanford's Center for Open and REproducible Science (CORES).

A video of the talk can be found at: https://youtu.be/JAQot6b1Cng

The goal of this exemplary analysis is to explore the effect of varying different hyper-parameters of the training of a simple classification model on its performance in scikit-learn's handwritten digit dataset.

Specifically, we will study the effect of varying the learning rate, regularisation strength, number of gradient descent steps, and random shuffling of the data on the 3-fold cross-validation performance of scikit-learn's linear support vector machine classifier.

Importantly, each hyper-parameter is varied separately while all other hyper-parameters are set to default values (for details, see scripts/evaluate_hyper_params_effect.py).

Project organization

├── LICENSE            <- MIT License
├── Makefile           <- Makefile with targets to 'load', 'evaluate', and 'plot' ('make all' runs all three analysis steps)
├── poetry.lock        <- Details of used package versions
├── pyproject.toml     <- Lists all dependencies
├── README.md          <- This README file.
├── docs/              
|    └──               <- Slides of the practical tutorial
├── data/
|    └──               <- A copy of the handwritten digit dataset provided by scikit-learn
|
├── results/
|    ├── estimates/
|    │    └──          <- Generated estimates of classifier performance
|    └── figures/
|         └──          <- Generated figures
|
├── scrips/
|    ├── load_data.py                       <- Downloads the dataset to specified 'data-path'
|    ├── evaluate_hyper_params_effect.py    <- Runs cross-validated hyper-parameter evaluation
|    ├── plot_hyper_params_effect.py        <- Summarizes results of evaluation in a figure
|    └── run_analysis.sh                    <- Runs all analysis steps
|
└── src/
    ├── hyper/
    │    ├──  __init__.py                   <- Makes 'hyper' a Python module
    │    ├── grid.py                        <- Functionality to sample hyper-parameter grid
    │    ├── evaluation.py                  <- Functionality to evaluate classifier performance, given hyper-parameters
    │    └── plotting.py                    <- Functionality to visualize results
    └── setup.py                            <- Makes 'hyper' pip-installable (pip install -e .)

Data description

We use the handwritten digits dataset provided by scikit-learn. For details on this dataset, see scikit-learn's documentation:

https://scikit-learn.org/stable/datasets/toy_dataset.html#digits-dataset

Installation

This project is written for Python 3.9.5 (we recommend pyenv for Python version management).

All software dependencies of this project are managed with Python Poetry. All details about the used package versions are provided in pyproject.toml.

To clone this repository to your local machine, run:

git clone https://github.com/athms/reproducible-modelling

To install all dependencies with poetry, run:

cd reproducible-modelling/
poetry install

To reproduce our analyses, you additionally need to install our custom Python module (src/hyper) in your poetry environment:

cd src/
poetry run pip install -e .

Reproducing our analysis

Our analysis can be reproduced either by running scripts/run_analysis.sh:

cd scripts
poetry run bash run_analysis.sh

..or by the use of make:

poetry run make <ANALYSIS TARGET>

We provide the following targets for make:

Analysis target	Description
all	Runs the entire analysis pipeline
load	Downloads scikit-learn's handwritten digit dataset
evaluate	Runs our cross-validated hyper-parameter evaluation
plot	Creates our results figure

This README file is strongly inspired by the Cookiecutter Data Science Structure

This is an example of a reproducible modelling project

Related tags

Overview

An example of a reproducible modelling project

What are we doing?

Project organization

Data description

Installation

Reproducing our analysis

Owner

Armin Thomas

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Face recognition. Redefined.

A Python wrapper for Google Tesseract

Pytorch Implementation of "Diagonal Attention and Style-based GAN for Content-Style disentanglement in image generation and translation" (ICCV 2021)

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

RuleBERT: Teaching Soft Rules to Pre-Trained Language Models

Turi Create simplifies the development of custom machine learning models.

ZEBRA: Zero Evidence Biometric Recognition Assessment

Source code of the paper PatchGraph: In-hand tactile tracking with learned surface normals.

Keqing Chatbot With Python

Deep Learning for Human Part Discovery in Images - Chainer implementation

PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Deep Learning for Morphological Profiling

To prepare an image processing model to classify the type of disaster based on the image dataset

Open CV - Convert a picture to look like a cartoon sketch in python