Airflow Operator for running Soda SQL scans

Overview

Soda SQL Airflow Operator

Airflow Operator for running Soda SQL scans

Example Usage

# src/soda/scans/my_scan.yml
# Note that the Airflow rendered templates are accessible (e.g. {{ params.client_id }})
table_name: tmp_{{ params.client_id }}_{{ ds_nodash }}
sql_metrics:
  - sql: |
      SELECT
        SUM(value1) AS staged_value1,
        SUM(value2) AS staged_value2
      FROM tmp_{{ params.client_id }}_{{ ds_nodash }}
  - sql: |
      SELECT
        SUM(value1) AS final_value1,
        SUM(value2) AS final_value2
      FROM final_table
      WHERE
        date = '{{ ds }}'
        AND client_id = {{ params.client_id }}
tests:
  - staged_value1 == final_value1
  - staged_value2 > final_value2
# my_airflow_dag.py
from pathlib import Path
from soda_util import build_soda_warehouse, convert_templated_yml_to_dict

SODA_PATH = Path(os.getenv("PYTHON_PATH", "/code/src")) / "/soda/scans/"  # Matches where my_scan.yml is saved

validate_staged_data = SodaSqlOperator(
    task_id="validate_staged_data",
    warehouse=build_soda_warehouse("warehouse_name", "database_name"),  # Could also pass a file path to a yml file
    scan=convert_templated_yml_to_dict(SODA_PATH, "my_scan.yml"),
    params={"client_id": 12345},  # Params are rendered by Airflow and accessible in the yaml file
)

Notes

  • Unlike Soda itself, a builder pattern is not used to define the warehouse and scan argument. Rather, the warehouse and scan parameters are instance checked and the relevant Soda methods are set. This provides a much simpler API, where we can just pass in the args to the Operator
  • As we are passing over all rendering of Jinga templates to Airflow, the native Soda templates are not accessible. So always use Airflow templates
  • Soft failures (i.e. the Airflow task doesn't fail, it just alerts) have been implemented, but alerting of soft failures has not. So soft failures will essentially just mean the Airflow task passes. Alerting to be implemented
Owner
Todd de Quincey
Data Engineer, Chartered Accountant and all round nice guy (or so I like to think). I believe that quality, simplicity and focus are the keys to success
Todd de Quincey
Простенький ботик для троллинга с интерфейсом #Yakima_Visus

Bot-Trolling-Vk Простенький ботик для троллинга с интерфейсом #Yakima_Visus Установка pip install vk_api pip install requests если там еще чото будет

Yakima Visus 4 Oct 11, 2022
Easy way to build a SaaS application using Python and Dash

EasySaaS This project will be attempt to make a great starting point for your next big business as easy and efficent as possible. This project will cr

xianhu 3 Nov 17, 2022
Albert launcher extension for rolling dice.

dice-roll-albert-ext Extension for rolling dice in Albert launcher Installation Locate the modules directory in the Python extension data directory. T

Jonah Lawrence 1 Nov 18, 2021
Whole-day timezone comparison

Timezone Converter Compare a full day of your local timezone with foreign ones $ timezone-converter tijuana --zone $ timezone-converter tijuana new_yo

Iago Alonso 12 Nov 24, 2022
EloGGs 🎮 is a 1v1.LOL Trophy Boosting Program (PATCHED)

EloGGs 🎮 is an old patched 1v1.LOL boosting program I developed months ago, My team made around $1000 total off of this, but now it's been patched by the developers.

doop 1 Jul 22, 2022
Zapiski za ure o C++-u

cpp-notes Zapiski o C++-u. Objavljena verzija je na https://e6.ijs.si/~jslak/c++/ Generating the notes The setup assumes you are working in a Linux en

Jure Slak 1 Jan 05, 2022
Nook is a simple, concatenative programming language written in Python.

Nook Nook is a simple, concatenative programming language written in Python. Status Nook is currently WIP. It lacks a lot of basic feature, and will n

Wumi4 4 Jul 20, 2022
An open-source hyper-heuristic framework for multi-objective optimization

MOEA-HH An open-source hyper-heuristic framework for multi-objective optimization. Introduction The multi-objective optimization technique is widely u

Hengzhe Zhang 1 Feb 10, 2022
Automatically unpin old messages so you can always pin more!

PinRotate Automatically unpin old messages so you can always pin more! Installation You will need to install poetry to run this bot locally for develo

3 Sep 18, 2022
Generates Windows 95 and 95 OEM keys using the modulus 7 check algorithm

w95keygen-python windowskeygen.py - Generates Windows 95 and 95 OEM keys using the modulus 7 check algorithm Just download and drop in the directory y

Joshua Alto 1 Dec 06, 2021
A collection of repositories used to realise various end-to-end high-level synthesis (HLS) flows centering around the CIRCT project.

circt-hls What is this?: A collection of repositories used to realise various end-to-end high-level synthesis (HLS) flows centering around the CIRCT p

29 Dec 14, 2022
京东自动入会获取京豆

京东入会领京豆 要求 有一定的电脑知识 or 有耐心爱折腾 需要Chrome(推荐)、Edge(Chromium)、Firefox 操作系统需是Mac(本人没在m1上测试)、Linux(在deepin上测试过)、Windows 安装方法 脚本采用Selenium遍历京东入会有礼界面,由于遍历了200

Vanke Anton 500 Dec 22, 2022
Auto Join Zoom Meeting

Auto-Join-Zoom-Meeting Join a zoom meeting with out filling in meeting id's or passcodes, one button for it all! Setup See attached excel document. MA

JareBear 1 Jan 25, 2022
freeCodeCamp Scientific Computing with Python Project for Certification.

Time_Calculator_freeCodeCamp freeCodeCamp Scientific Computing with Python Project for Certification. Write a function named add_time that takes in tw

Rajdeep Mondal 1 Dec 23, 2021
Material de apoio da oficina de SAST apresentada pelo CAIS no Webinar de 28/05/21.

CAIS-CAIS Conjunto de Aplicações Intencionamente Sem-Vergonha do CAIS Material didático do Webinar "EP1. Oficina - Práticas de análise estática de cód

Fausto Filho 14 Jul 25, 2022
SimplePyBLE - Python bindings for SimpleBLE

The ultimate fully-fledged cross-platform Python BLE library, designed for simplicity and ease of use.

Open Bluetooth Toolbox 27 Aug 28, 2022
Decentralized intelligent voting application.

DiVA Decentralized intelligent voting application. Hack the North 2021. Inspiration Following the previous US election, many voters were fearful that

Ali Shariatmadari 4 Jun 05, 2022
Port of the OpenCascade library to JavaScript / WebAssembly using Emscripten

OpenCascade.js A port of the OpenCascade CAD library to JavaScript and WebAssembly via Emscripten. Explore the docs » Examples · Issues · Discuss Proj

Sebastian Alff 347 Jan 08, 2023
A basic layout of atm working of my local database

Software for working Banking service 😄 This project was developed for Banking service. mysql server is required To have mysql server on your system u

satya 1 Oct 21, 2021
Repository voor verhalen over de woningbouw-opgave in Nederland

Analyse plancapaciteit woningen In deze notebook zetten we cijfers op een rij om de woningbouwplannen van Nederlandse gemeenten in kaart te kunnen bre

Follow the Money 10 Jun 30, 2022