Material for my PyConDE & PyData Berlin 2022 Talk "5 Steps to Speed Up Your Data-Analysis on a Single Core"

Last update: Dec 12, 2022

Related tags

Deep Learning data-analysis-speedup

Overview

5 Steps to Speed Up Your Data-Analysis on a Single Core

Material for my talk at the PyConDE & PyData Berlin 2022

Description

Your data analysis pipeline works. Nice.
Could it be faster? Probably.
Do you need to parallelize? Not yet.

We'll go through optimization steps that boost the performance of your data analysis pipeline on a single core, reducing time & costs. This walkthrough shows tools and strategies to identify and mitigate bottlenecks, and demonstrate them in an example. The 5 steps cover:

Identifying bottlenecks: Profiling
Efficient IO
Vectorization
Memory & Precision Tradeoffs
Jit-ting with numba

This talk is suited for data scientists on a beginner and intermediate level, typically working with a numpy/scipy/… stack or similar. The talk gives strategies & concrete suggestions how to speed up an existing analysis pipeline, which is demonstrated practically on an example, showing the gained speed improvements of each step.

Installation & Usage

python3 -m pip install poetry
poetry install
poetry run python -m jupyterlab

Dev

./format.sh

Material for my PyConDE & PyData Berlin 2022 Talk "5 Steps to Speed Up Your Data-Analysis on a Single Core"

Related tags

Overview

5 Steps to Speed Up Your Data-Analysis on a Single Core

Description

Installation & Usage

Dev

Owner

Jonathan Striebel

Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)

Collection of machine learning related notebooks to share.

Implementation for the paper: Invertible Denoising Network: A Light Solution for Real Noise Removal (CVPR2021).

A Pytorch implementation of "LegoNet: Efficient Convolutional Neural Networks with Lego Filters" (ICML 2019).

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Neighborhood Contrastive Learning for Novel Class Discovery

ECLARE: Extreme Classification with Label Graph Correlations

Fuzzing tool (TFuzz): a fuzzing tool based on program transformation

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

Multi-Template Mouse Brain MRI Atlas (MBMA): both in-vivo and ex-vivo

Code for "Learning Graph Cellular Automata"

Multi-Modal Machine Learning toolkit based on PyTorch.

Learning to Draw: Emergent Communication through Sketching

Combining Diverse Feature Priors

TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-grained Classification".

HGCAE Pytorch implementation. CVPR2021 accepted.

JittorVis - Visual understanding of deep learning models

ROS Basics and TurtleSim

Boosted CVaR Classification (NeurIPS 2021)