Python client for using Prefect Cloud with Saturn Cloud

Overview

prefect-saturn

GitHub Actions PyPI Version

prefect-saturn is a Python package that makes it easy to run Prefect Cloud flows on a Dask cluster with Saturn Cloud. For a detailed tutorial, see "Fault-Tolerant Data Pipelines with Prefect Cloud ".

Installation

prefect-saturn is available on PyPi.

pip install prefect-saturn

prefect-saturn can be installed directly from GitHub

pip install git+https://github.com/saturncloud/[email protected]

Getting Started

prefect-saturn is intended for use inside a Saturn Cloud environment, such as a Jupyter notebook.

import prefect
from prefect import Flow, task
from prefect_saturn import PrefectCloudIntegration


@task
def hello_task():
    logger = prefect.context.get("logger")
    logger.info("hello prefect-saturn")


flow = Flow("sample-flow", tasks=[hello_task])

project_name = "sample-project"
integration = PrefectCloudIntegration(
    prefect_cloud_project_name=project_name
)
flow = integration.register_flow_with_saturn(flow)

flow.register(
    project_name=project_name,
    labels=["saturn-cloud"]
)

Customize Dask

You can customize the size and behavior of the Dask cluster used to run prefect flows. prefect_saturn.PrefectCloudIntegration.register_flow_with_saturn() accepts to arguments to accomplish this:

For example, the code below tells Saturn that this flow should run on a Dask cluster with 3 xlarge workers, and that prefect should shut down the cluster once the flow run has finished.

flow = integration.register_flow_with_saturn(
    flow=flow,
    dask_cluster_kwargs={
        "n_workers": 3,
        "worker_size": "xlarge",
        "autoclose": True
    }
)

flow.register(
    project_name=project_name,
    labels=["saturn-cloud"]
)

Contributing

See CONTRIBUTING.md for documentation on how to test and contribute to prefect-saturn.

Comments
  • [CU-feu7x7] saturn labels

    [CU-feu7x7] saturn labels

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Automatically add saturn-specific labels to the flow environment.

    How does this PR improve prefect-saturn?

    Along with the related change in saturn, this allows flows to be assigned to the correct cluster agent even if there are agents in multiple clusters that are all using the same prefect-cloud tenant.

    opened by bhperry 7
  • Bump development version

    Bump development version

    • [X] passes make lint
    • [X] adds tests to tests/ (if appropriate)

    What does this PR change?

    • Bumps version to 0.5.1.9000 for development

    How does this PR improve prefect-saturn?

    With this change, users can rely on the behavior that installations from source control will have a newer version than the newest release available on PyPI, but guaranteed to be older than the next release on PyPI.

    opened by dotNomad 6
  • [CU-frwyvw] Set flow instance size

    [CU-frwyvw] Set flow instance size

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Adds instance size argument for flow node

    How does this PR improve prefect-saturn?

    Enables users to select a larger node size for their flow to run on. This means we are not forcing users to farm all of the work out to dask-clusters, they can instead run flows with a local executor and not be constrained by a medium tier node.

    opened by bhperry 4
  • pickle is a bad choice for hashing.

    pickle is a bad choice for hashing.

    Without this PR, modifying the python version could lead to different hashes for flow metadata. This PR hashes identifying information for flows using json, rather than pickle. This PR does change the hashing function. As a result re-registering a flow will create a new flow in Saturn Cloud.

    opened by hhuuggoo 3
  • add expectation that BASE_URL does not end in a slash

    add expectation that BASE_URL does not end in a slash

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Changes PrefectCloudIntegration to expect the environment BASE_URL to NOT end in a trailing slash. In all Saturn production environments, BASE_URL does not end in a trailing slash.

    This PR also bumps the version of prefect-saturn to 0.2.0, since this change in behavior is a breaking change.

    How does this PR improve prefect-saturn?

    Without this fix, prefect-saturn is currently broken in Saturn production environments.

    Why can't we just keep the code that adds or removes slashes as needed?

    This project uses prefect's WebhookStorage (https://docs.prefect.io/orchestration/execution/storage_options.html#webhook). That type of storage uses template strings to reference environment variables, like this:

    flow = Flow(
        "some-flow",
        storage=Webhook(
            build_request_kwargs={
                "url": "${BASE_URL}/api/whatever",
            },
            build_request_http_method="POST",
           ....
        )
    )
    

    There is not place in there where we can introduce Saturn-written code that checks the environment variable BASE_URL and deals with missing or extra trailing slashes. So an assumption about whether or not the environment variable BASE_URL ends in a slash has to be hard-coded into the codebase here.

    opened by jameslamb 2
  • Fix tenant_id attr for Prefect 15

    Fix tenant_id attr for Prefect 15

    • [x] passes make lint
    • [ ] adds tests to tests/ (if appropriate)

    I don't believe any tests need to be added.

    What does this PR change?

    This PR changes the Client()._active_tenant_id > Client().tenant_id (when using Prefect >= 0.15) since _active_tenant_id is no longer a property of Prefect's Client in 0.15.

    How does this PR improve prefect-saturn?

    This allows prefect-saturn to work with prefect in version 0.15 which removed the private attribute.

    opened by dotNomad 1
  • [DEV-1227] Replace pyyaml

    [DEV-1227] Replace pyyaml

    Unlike PyYAML, ruamel.yaml supports:

    • YAML <= 1.2. PyYAML only supports YAML <= 1.1 This is vital, as YAML 1.2 intentionally breaks backward compatibility with YAML 1.1 in several edge cases. This would usually be a bad thing. In this case, this renders YAML 1.2 a strict superset of JSON. Since YAML 1.1 is not a strict superset of JSON, this is a good thing.
    • Roundtrip preservation When calling yaml.dump() to dump a dictionary loaded by a prior call to yaml.load():

    See more details at https://yaml.readthedocs.io/en/latest/pyyaml.html

    opened by wreis 1
  • Switch from Docker storage to Webhook storage

    Switch from Docker storage to Webhook storage

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    This PR replaces Docker storage with Webhook storage.

    This PR also adds a small working example to the README, to show users how it works. A longer-form tutorial will be up on https://www.saturncloud.io/docs/ some time in the next week.

    How does this PR improve prefect-saturn?

    The use of Webhook storage will make the integration between Prefect Cloud and Saturn Cloud much faster and less error-prone. It eliminates several hacks and special cases, should be much quicker, and removes an awkward race condition. With Docker storage, after calling .register_flow_with_saturn() you had to wait for a k8s job that built and pushed the image to complete. That could take up to 10 minutes, and nothing in prefect-saturn or the Saturn UI allowed you to see logs or other progress of that job.

    That job was also very brittle...it required docker-in-docker trickery, patching user-chosen images with several build-time-only dependencies, and running a sequence of multiple gnarly shell scripts.

    With Webhook storage, all of that complexity is eliminated. When you call .register_flow_with_saturn(), the flow is serialized with cloudpickle, sent to Saturn, and stored as bytes in an object store. At run time, when flow.storage.get_flow() is called, Saturn retrieves the binary content of the flow from an object store and sends it back over the wire. That's it! Everything is synchronous, so no weird race conditions, and it's just passing bytes around, so the storage process goes from 10+ minutes to under a second.

    opened by jameslamb 1
  • Add Saturn project details and remove unnecessary stuff

    Add Saturn project details and remove unnecessary stuff

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    • removes fields from saturn_details that are unnecessary as of #8
    • renames saturn_details to storage_details since that now only contains information for building storage
    • bump version floor to Python 3.7 (all the Saturn images are Python 3.7)
    • set ignore_healthchecks=True on the storage object.
      • This avoids a weird error where the image being built with flow.storage.build() doesn't have stuff like the Saturn start script in it, which could cause errors.
      • We don't need to care about that because every job in the flow's lifecycle will have the start script added to it's command / args

    How does this PR improve prefect-saturn?

    This PR removes unnecessary fields that could have become dependencies in users' code, reducing the surface for breaking changes.

    The ignore_healthchecks thing makes it possible for users to rely on the start script to install libraries.

    opened by jameslamb 1
  • change strategy for identifying flows

    change strategy for identifying flows

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    This PR changes the strategy for uniquely identifying flows. Now the flow_hash sent to Saturn Cloud is the sha256 hash of:

    • project name
    • flow name
    • Prefect Cloud tenant id

    This means that flow_hash is equivalent to the Prefect Cloud concept flow_group_id. This PR proposes creating this hash ourselves because Prefect Cloud's flow_group_id can't be known until you've registered a flow with Prefect Cloud, and prefect-saturn needs to register with Saturn first.

    How does this PR improve prefect-saturn?

    This PR provides a reliable way to uniquely identify all versions of the same flow. It improves on the previous model, where tenant id was not considered. The previous model could have caused conflicts in the case where two flows with identical names and code, in Prefect Cloud projects with the same name but in different tenants, would get the same hash and conflict.

    Because this PR no longer considers the task graph in the hash, it also means that the hash will not change as a flow's task graph changes. That means pushing Docker storage to a container registry should be a lot faster, since it'll be more likely to hit the registry's cache.

    Notes for Reviewers

    I had to introduce prefect.client.Client in this PR, and then mock it with patch(). A lot of the diff in the test files is just whitespace, the result of adding in a with patch(....). I recommend reviewing with whitespace changes hidden.

    opened by jameslamb 1
  • Add more metadata to flows

    Add more metadata to flows

    This PR includes a few updates that improve execution and testing for the end-to-end experience with Prefect Cloud.

    What does this PR change?

    This PR adds more details from Saturn to the flow, so the agent running it knows how to configure the first job that loads the flow.

    How does this PR improve prefect-saturn?

    This PR allows flows executed from Prefect Cloud to take advantage of Saturn-y customization features like env secrets, filesystem secrets, and a custom start script.

    opened by jameslamb 1
Releases(v0.6.0)
  • v0.6.0(Nov 4, 2021)

    What's Changed

    • No default dask cluster by @jsignell in https://github.com/saturncloud/prefect-saturn/pull/50
    • Add encoding kwarg to open by @jsignell in https://github.com/saturncloud/prefect-saturn/pull/52

    New Contributors

    • @jsignell made their first contribution in https://github.com/saturncloud/prefect-saturn/pull/50

    Full Changelog: https://github.com/saturncloud/prefect-saturn/compare/v0.5.1...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Jul 15, 2021)

  • v0.5.0(Apr 23, 2021)

    Breaking

    None

    Features

    • replace pyyaml with ruamel-yaml (#28)
    • add support for KubernetesRun "RunConfig", if using this library with prefect >= 0.13.10 (#34, #36, #37)
      • this does not break compatibility with prefect >0.13.0,<=0.13.9

    Bug Fixes

    • fix deprecation warnings from prefect 0.14.x (#32)
      • some modules were reorganized from prefect 0.13.x to 0.14.x, and using the 0.13.x-style paths raises deprecation warnings

    Docs

    • support Python 3.8 (#33)
      • this library was already compatible with Python 3.8, but that is now tested on every build and documented in the package classifiers
    • fix keywords in package metadata (#39)
      • this improves the discoverability of this project on PyPi
    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(Dec 9, 2020)

    Breaking

    None

    Features

    None

    Bug Fixes

    • set an explicit default of autoclose = False for dask_cluster_kwargs (#25)
      • this ensures that, by default, flows registered with prefect-saturn leave their Dask cluster up at the end of execution
      • this avoids the risk of one flow run closing down a Dask cluster that is in use by another flow run
      • this was already prefect-saturn's behavior, but only indirectly because autoclose defaults to False in dask-saturn. Not that is directly the default in prefect-saturn.
    • add tests on describe_sizes() (#24)

    Docs

    • added more docs in the README on how to customize the Dask cluster used by DaskExecutor (#25)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Dec 9, 2020)

    Breaking

    None

    Features

    • You can now set the instance size for the node that runs flow.run() (#23)
      • PrefectCloudIntegration.register_flow_with_saturn() gets a new keyword argument `instance_size
      • use new function describe_sizes() to list the valid options

    Bug Fixes

    None

    Docs

    • added documentation on changing the size of the instance that a flow runs on
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Nov 19, 2020)

    Breaking

    None

    Features

    None

    Bug Fixes

    • fix broken installations from source distribution (prefect-saturn-*.tar.gz) (#22)

    Docs

    • package LICENSE file with package artifacts (#22)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Nov 6, 2020)

  • v0.4.0(Oct 21, 2020)

  • v0.3.0(Oct 16, 2020)

  • v0.2.0(Sep 18, 2020)

    Breaking

    • prefect-saturn now expects that the environment variable BASE_URL does not end in a slash (#17)

    Features

    None

    Bug Fixes

    None

    Docs

    None

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Aug 31, 2020)

  • v0.1.0(Aug 13, 2020)

    Breaking

    • Moved .add_storage() and .add_environment() internal, and made .register_flow_with_saturn() do more. (#14) Now the interface is just like this:

      integration = PrefectCloudIntegration("some-project")
      flow.register_flow_with_saturn()
      flow.register(project_name="some-project", labels=["saturn-cloud"])
      
    • Replaced Docker storage with Webhook storage (#14)

    • Bumped prefect version floor to 0.13.0 (the first release that had Webhook) (#14)

    Features

    None

    Bug Fixes

    None

    Docs

    • Added a minimal working example in README (#14)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Jul 23, 2020)

    • moved some details of building KubernetesJobEnvironment into Saturn's back-end and out of this library
    • removed unnecessary elements in saturn_details
    • renamed saturn_details to storage_details since it now only contains things needed for building storage
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Jul 21, 2020)

Owner
Saturn Cloud
End-to-End Data Science in Python Featuring Dask and Jupyter
Saturn Cloud
Utility for downloading fanfiction in bulk from the Archive of Our Own

What is this? This is a program intended to help you download fanfiction from the Archive of Our Own in bulk. This program is primarily intended to wo

73 Dec 30, 2022
telegram bot that calculates file hash / Dosya toplamı hesaplayan telegram botu

Telegram File Hash Bot FileHashBot: 🇬🇧 Bot that calculates file hashes. 🇹🇷 Dosya toplamları hesaplayan bot. Demo in Telegram: @FileHashBot 🇬🇧 Se

Hüzünlü Artemis [HuzunluArtemis] 5 Jun 29, 2022
Live Weather Updates using Flask and OpenWeather

AuraX Live Weather Updates using Flask and OpenWeather Installation To setup this project on your local machine, first clone this repository and insta

Ayush Gupta 3 Nov 02, 2021
Send embeds using your discord personal account

Welcome to Embed Sender 👋 Send embeds using your discord personal account Install pip install -r requirements.txt Usage Put your discord token in ./

SkydenFly 11 Sep 07, 2022
DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda

DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda

CRED 71 Dec 29, 2022
Discord Remote Administration Tool

Discord Remote Administration Tool

Rdimo 82 Aug 15, 2022
Its Is A Telegram Maths Basic Calculator Bot

Its Is A Telegram Maths Basic Calculator Bot

ANKIT KUMAR 1 Dec 26, 2021
Stack Overflow Error Parser

A python tool that executes python files and opens respective Stack Overflow threads in browser for errors encountered.

Raghavendra Khare 3 Jul 24, 2022
With this program you can work English & Turkish

1 - How Can I Work This? You must have Python compilers in order to run this program. First of all, download the compiler in the link. Compiler 2 - Do

Mustafa Bahadır Doğrusöz 3 Aug 07, 2021
The Github repository for the Amari API wrapper.

Amari.py Amari.py is an async, easy to use API wrapper for the AmariBot. Installation Enter any of these commands to install the library: pip install

TheF1ng3r 5 Dec 19, 2022
Gets instagram public username and returns usefull informations like profilepic(b64), video_urls etc.

InstaSucker Gets instagram public username and returns usefull informations like profilepic(b64), video_urls etc. Information this project contains a

Armin Amiri 5 Apr 30, 2022
A simple way to create a request to the coinpayment API with a valid HMAC using your private key and command

Coinpayments Verify TXID Created for Astral Discord bot A simple way to create a request to the coinpayment API with a valid HMAC using your private k

HellSec 1 Nov 07, 2022
Detects members having unicode names. Public bot: @scarletwitchprobot

✨ Scarletwitch bot ✨ Detects unicode names members in a tg chat & provides a option to take action on that user ! Public bot: @scarletwitchprobot Supp

ÁÑÑÍHÌLÅTØR SPÄRK 18 Nov 12, 2022
A website application running in Google app engine, deliver rss news to your kindle. generate mobi using python, multilanguages supported.

Readme of english version refers to Readme_EN.md 简介 这是一个运行在Google App Engine(GAE)上的Kindle个人推送服务应用,生成排版精美的杂志模式mobi/epub格式自动每天推送至您的Kindle或其他邮箱。 此应用目前的主要

2.6k Jan 06, 2023
A napari plugin for visualising and interacting with electron cryotomograms

napari-subboxer A napari plugin for visualising and interacting with electron cryotomograms. Installation You can install napari-subboxer via pip: pip

3 Nov 25, 2021
A twitter multi-tool for OSINT on twitter accounts.

TwitterCheckr A twitter multi-tool for OSINT on twitter accounts. Infomation TwitterCheckr also known as TCheckr is multi-tool for OSINT on twitter a

IRIS 16 Dec 23, 2022
VALORANT rank yoinker lets you retrieve the ranks and basic informations of everyone in the lobby, regardless of gamemode.

vRY VALORANT rank yoinker Retrieve the rank and basic information of everyone in the lobby, regardless of gamemode. Table of Contents Terms of Use Abo

Isaac Kenyon 270 Dec 30, 2022
Shiny Wechat Pay SDK for Python

WeChat third-party Python SDK master: Read the Documentation Features Common public platforms passively respond and actively call APIs WeChat Pay API

Obrisk 18 Sep 05, 2022
Mushahid Ali 1 Dec 31, 2021
Telegram vc - A bot that can play music on telegram group's voice call

Telegram Voice Chat Bot A bot that can play music on telegram group's voice call

1 Jan 02, 2022