Convert monolithic Jupyter notebooks into Ploomber pipelines.

Last update: Dec 16, 2022

Overview

Soorgeon

Convert monolithic Jupyter notebooks into Ploomber pipelines.

soorgeon.mp4

3-minute video tutorial.

Try the interactive demo:

Note: Soorgeon is in alpha, help us make it better.

Install

pip install soorgeon

Usage

# refactor notebook
soorgeon refactor nb.ipynb

# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet

# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory

# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py

To learn more, check out our guide.

Examples

git clone https://github.com/ploomber/soorgeon

Exploratory daya analysis notebook:

cd examples/exploratory
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

Machine learning notebook:

cd examples/machine-learning
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

To learn more, check out our guide.

Convert monolithic Jupyter notebooks into Ploomber pipelines.

Related tags

Overview

Soorgeon

Install

Usage

Examples

Community

Owner

Ploomber

Produces a summary CSV report of an Amber Electric customer's energy consumption and cost data.

NumPy and Pandas interface to Big Data

Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

ICLR 2022 Paper submission trend analysis

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

MapReader: A computer vision pipeline for the semantic exploration of maps at scale

PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

Leverage Twitter API v2 to analyze tweet metrics such as impressions and profile clicks over time.

Show you how to integrate Zeppelin with Airflow

BIGDATA SIMULATION ONE PIECE WORLD CENSUS

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

Common bioinformatics database construction

Dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Data Analytics on Genomes and Genetics

Calculate multilateral price indices in Python (with Pandas and PySpark).

Monitor the stability of a pandas or spark dataframe ⚙︎

Very useful and necessary functions that simplify working with data

A lightweight, hub-and-spoke dashboard for multi-account Data Science projects