:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

Overview

dbt-pre-commit

pre-commit-dbt

CI black black

List of pre-commit hooks to ensure the quality of your dbt projects.

BETA NOTICE: This tool is still BETA and may have some bugs, so please be forgiving!

Goal

Quick ensure the quality of your dbt projects.

dbt is awesome, but when a number of models, sources, and macros grow it starts to be challenging to maintain quality. People often forget to update columns in schema files, add descriptions, or test. Besides, with the growing number of objects, dbt slows down, users stop running models/tests (because they want to deploy the feature quickly), and the demands on reviews increase.

If this is the case, pre-commit-dbt is here to help you!

List of pre-commit-dbt hooks

πŸ’‘ Click on hook name to view the details.

Model checks:

Script checks:

Source checks:

Modifiers:

dbt commands:


❗ If you have an idea for a new hook or you found a bug, let us know ❗

Install

For detailed installation and usage, instructions see pre-commit.com site.

pip install pre-commit

Setup

  1. Create a file named .pre-commit-config.yaml in your dbt root folder.
  2. Add list of hooks you want to run befor every commit. E.g.:
repos:
- repo: https://github.com/offbi/pre-commit-dbt
  rev: v0.1.1
  hooks:
  - id: check-script-semicolon
  - id: check-script-has-no-table-name
  - id: dbt-test
  - id: dbt-docs-generate
  - id: check-model-has-all-columns
    name: Check columns - core
    files: ^models/core
  - id: check-model-has-all-columns
    name: Check columns - mart
    files: ^models/mart
  - id: check-model-columns-have-desc
    files: ^models/mart
  1. Optionally, run pre-commit install to set up the git hook scripts. With this, pre-commit will run automatically on git commit! You can also manually run pre-commit run after you stage all files you want to run. Or pre-commit run --all-files to run the hooks against all of the files (not only staged).

Run as Github Action

Unfortunately, you cannot natively use pre-commit-dbt if you are using dbt Cloud. But you can run checks after you push changes into Github.

To do that, make a file .github/workflows/pre-commit.yml.

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 - uses: actions/[email protected]
 - uses: pre-commit/[email protected]

To run only changed files:

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 - uses: actions/[email protected]
 - id: file_changes
 uses: trilom/[email protected]
 with:
 output: ' '
 - uses: pre-commit/[email protected]
 with:
 extra_args: --files ${{ steps.file_changes.outputs.files}}

To be able to run modifiers you need to use only private repository and change your .github/workflows/pre-commit.yml to:

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 with:
 fetch-depth: 0
 - uses: actions/[email protected]
 - id: file_changes
 uses: trilom/[email protected]
 with:
 output: ' '
 - uses: pre-commit/[email protected]
 with:
 extra_args: --files ${{ steps.file_changes.outputs.files}}
 token: ${{ secrets.GITHUB_TOKEN }}

For more informations about pre-commit/action visit https://github.com/pre-commit/action.

Comments
  • add argument support to dbt commands (dbt clean & dbt deps)

    add argument support to dbt commands (dbt clean & dbt deps)

    This enhancement will allow the dbt-clean and dbt-deps hooks to take arguments for global and cmd flags. This addresses issue 39 by allowing the project-dir flag to be added to the dbt clean and dbt deps commands Example:

    • id: dbt-clean args: ["--cmd-flags", "++project-dir", "./transform/dbt"]
    opened by Aostojic 8
  • Docker image pull down image action fails

    Docker image pull down image action fails

    Describe the bug When hooking to the docker file, getting error invalid reference format when the build runs the step to pull the action image.

    Pull down action image 'offbi:pre-commit-dbt:v1.0.0' /usr/bin/docker pull offbi:pre-commit-dbt:v1.0.0 invalid reference format Warning: Docker pull failed with exit code 1, back off 3.275 seconds before retry.

    To Reproduce Steps to reproduce the behavior:

    1. add new file .github/workflows/pre-commit-dbt.yml:
    name: pre-commit
    
    on:
      pull_request:
        branches: [master]
    
    jobs:
      pre-commit:
        runs-on: ubuntu-latest
        steps:
        - uses: actions/[email protected]
        - uses: actions/[email protected]
        - id: file_changes
          uses: trilom/[email protected]
          with:
            output: ' '
        - uses: offbi/[email protected]
          env:
            DB_PASSWORD: ${{ secrets.SuperSecret }}
          with:
            args: run --files ${{ steps.file_changes.outputs.files}}
    
    1. add the following code to existing (or create new) .pre-commit-config.yaml
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: v1.0.0
      hooks:
      - id: check-script-has-no-table-name
        files: ^dbt
    
    1. commit changes and create a pull request to trigger test build.

    Expected behavior Build test completes successfully.

    Version: v1.0.0 (dbt version 0.21.0)

    Additional context I think it's odd that the yml file clearly shows offbi/[email protected] for the step action, but when the docker pull command is passed the syntax is changed to offbi:pre-commit-dbt:v1.0.0 (notice colons in place of slash & @). Nowhere in the documentation does it list how to avoid this. Also note: The "test" completes successfully if the following block is removed (all other steps run correctly):

        - uses: offbi/[email protected]
          env:
            DB_PASSWORD: ${{ secrets.SuperSecret }}
          with:
            args: run --files ${{ steps.file_changes.outputs.files}}
    

    So it seems that it's this specific docker image causing the issue.

    bug 
    opened by grace-cityblock 6
  • add prj_root argument to dbt commands

    add prj_root argument to dbt commands

    This enhancement enables user to benefit from dbt-command hooks whenever their dbt project is not at the root level of the repo. It works by providing an additional argument to the dbt commands. Example: hooks: - id: dbt-docs-generate files: ^projects/ args: ["--prj-root","./common/dbt"] - id: dbt-deps files: ^projects/ args: ["--prj-root","./common/dbt"]

    In this example, the root of the dbt project is located inside the folder './commond/dbt'.

    Addresses issue https://github.com/offbi/pre-commit-dbt/issues/39

    opened by joaobernardopa 5
  • Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json')

    Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json')

    Describe the bug Any hook that tries to read target/manifest.json results in Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json') if dbt project directory (contains target/) != git project root.

    To Reproduce Steps to reproduce the behavior:

    1. Create the following project structure
    my_project/
    β”œβ”€β”€ .git/
    └── dbt_project/
         β”œβ”€β”€ dbt_project.yml
         β”œβ”€β”€ target/
         β”œβ”€β”€ models/
         └── ...
    
    1. Run any hook that requires the manifest

    Expected behavior I guess the current behavior is expected but there could be an option to specify the dbt project directory.

    Workaround: cp -r dbt_project/target target

    Version: v1.0.0

    bug 
    opened by stumelius 5
  • Cannot use the action in a workflow

    Cannot use the action in a workflow

    Describe the bug Hi there, first of all thanks for the efforts in developing this library. We are trying to include this step in our workflow, but we are facing an issue related to the docker image. It cannot pull it:

    Screenshot 2021-04-23 at 10 31 04

    Might the issue be related to the actual image defined in action.yaml ?

    To Reproduce Steps to reproduce the behavior:

    1. follow either the readme example or the snippet as reported in the marketplace

    Example configuration:

         - id: dbt_checks
            uses: offbi/[email protected]
            env:
              [ ... ]
            with:
              args: run --files ${{ steps.file_changes.outputs.files }}
    

    Version: v1.0.0

    bug 
    opened by Sh1n 5
  • check-column-desc-are-same fails with a Python error

    check-column-desc-are-same fails with a Python error

    Describe the bug When trying to use the check-column-desc-are-same hook, I'm getting the following Python error. Other hooks I've tried so far are working:

    Check column descriptions are same.......................................Failed
    - hook id: check-column-desc-are-same
    - exit code: 1
    
    Traceback (most recent call last):
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/bin/check-column-desc-are-same", line 8, in <module>
        sys.exit(main())
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 80, in main
        return check_column_desc(paths=args.filenames, ignore=args.ignore)
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 55, in check_column_desc
        grouped = get_grouped(paths, ignore)
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 48, in get_grouped
        sorted(columns, key=lambda x: x.column_name), lambda x: x.column_name
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 29, in get_all_columns
        for item in schemas:
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/utils.py", line 134, in get_model_schemas
        model_name = model.get("name")
    AttributeError: 'str' object has no attribute 'get'
    

    Version: v0.1.1 Python 3.7.9

    bug 
    opened by MartinGuindon 5
  • check-script-has-no-table-name is failing when using lateral flatten

    check-script-has-no-table-name is failing when using lateral flatten

    Describe the bug See updated description of the bug.

    ~~The check-script-has-no-table-name pre-commit hook is confusing CTEs with tables, and fails with code like this:~~

    with source as (
    
        select * from {{ source('stripe', 'payments') }}
    
    ),
    
    renamed as (
    
        select
            id as payment_id,
            order_id,
            payment_method,
    
            --`amount` is currently stored in cents, so we convert it to dollars
            amount / 100 as amount
    
        from source
    
    )
    
    select * from renamed
    

    ~~It reports that "source" and "renamed" are tables even though they are not, even though it looks the same from a code perspective.~~

    ~~I think this hook should perhaps fail only at the presence of schema.table or database.schema.table references, unless we can make this hook smarter by being aware of the CTEs defined in the model.~~

    Version: v0.1.1

    bug 
    opened by MartinGuindon 5
  • Fix Github Action docker reference

    Fix Github Action docker reference

    According to Github Actions documentation, we should reference the docker image directly from the project: https://docs.github.com/pt/actions/creating-actions/creating-a-docker-container-action#creating-an-action-metadata-file

    This fixes: https://github.com/offbi/pre-commit-dbt/issues/26

    opened by tlfbrito 3
  • Add check-model-name-contract hook

    Add check-model-name-contract hook

    There's now a check-model-name-contract hook (similar to check-model-name-contract) that is used to ensure that models follow naming convention.

    To-do before merge

    • [x] Write unit tests for the hook
    opened by stumelius 3
  • check-model-has-properties-file fails on macro with a valid properties yml

    check-model-has-properties-file fails on macro with a valid properties yml

    Describe the bug When running the test check-model-has-properties-file with a macro, the test fails with the following error.

    Check the model has properties file......................................Failed
    - hook id: check-model-has-properties-file
    - exit code: 1
    
    macros/grant_select_on_schemas.sql: does not have model properties defined in any .yml file.
    

    The .pre-commit-config.yaml includes the rule:

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: 607cb07a1918442f5963662a9aa19da8984931e6
      hooks:
      - id: check-model-has-properties-file
    

    And the macro has the following .yml file (the filename is the same as the macro name and is stored within the macros folder):

    
    macros:
      - name: grant_select_on_schemas
        description: "Grants privileges to groups after dbt run"
        docs:
          show: false
    

    Hope we can get this fixed soon as this is a really useful test to include

    bug 
    opened by andrewlee-trouva 3
  • wip: Added hook: check_model_has_tests_by_group

    wip: Added hook: check_model_has_tests_by_group

    This PR adds a hook that checks if a model has a sufficient number of tests pulled out of a group of acceptable tests, e.g. this model has 1 of unique, unique_where, or unique_threshold.

    opened by jtalmi 3
  • `check_macro_arguments_have_desc` hook fails to parse arguments

    `check_macro_arguments_have_desc` hook fails to parse arguments

    Describe the bug

    check_macro_arguments_have_desc hook raises the following error even though the content of the files is ok. Other hooks are working correctly, including check_macro_has_description.

    Traceback (most recent call last):
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/bin/check-macro-arguments-have-desc", line 8, in <module>
        sys.exit(main())
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/lib/python3.10/site-packages/pre_commit_dbt/check_macro_arguments_have_desc.py", line 90, in main
        status_code, _ = check_argument_desc(paths=args.filenames, manifest=manifest)
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/lib/python3.10/site-packages/pre_commit_dbt/check_macro_arguments_have_desc.py", line 52, in check_argument_desc
        for key, value in item.macro.get("arguments", {}).items()
    AttributeError: 'list' object has no attribute 'items'
    

    To Reproduce

    Steps to reproduce the behavior using getdbt examples:

    1. macros/schema.yml
    version: 2
    
    macros:
      - name: cents_to_dollars
        description: A macro to convert cents to dollars
        arguments:
          - name: column_name
            type: string
            description: The name of the column you want to convert
          - name: precision
            type: integer
            description: Number of decimal places. Defaults to 2.
    
    1. macros/cents_to_dollars.sql
    {% macro cents_to_dollars(column_name, precision=3) %}
        COALESCE (TRUNC(CAST({{ column_name }}/100 AS numeric), {{ precision }}), 0)
    {% endmacro %}
    
    1. Execute the following command after dbt deps, dbt compile and dbt docs generate: pre-commit run check-macro-arguments-have-desc --files macros/cents_to_dollars.sql

    Expected behavior The hook should pass successfully.

    Version: commit 34a2341234675d7a6b61766b2c33bdd5c33d090b (current latest commit)

    Additional context It looks like the problem is here: for key, value in item.macro.get("arguments", {}).items() The hook is guessing the arguments key contains a dictionary while this is a list of dictionaries. https://github.com/offbi/pre-commit-dbt/blob/main/pre_commit_dbt/check_macro_arguments_have_desc.py#L52

    bug 
    opened by hacherix 0
  • check-model-parents-and-childs for zero child check never runs

    check-model-parents-and-childs for zero child check never runs

    Attempting to use check-model-parents-and-childs hook to ensure that our data consumption layer models do not have any children does not fail when models DO have children

    Steps to reproduce the behavior:

    1. dbt project with a parent and child models
    2. Add check-model-parents-and-childs with --max-child-cnt of zero
      - id: check-model-parents-and-childs
        name: Check for child models in data consumption layers
        # manifest.json required
        args: ["--manifest","./pipelines/target/manifest.json","--max-child-cnt","0","--"]
        files: models/self_service/
    

    Expected outcome:

    Model that has a child is raised as failure

    Actual outcome:

    Hook passes

    Version:

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: 34a2341234675d7a6b61766b2c33bdd5c33d090b
    

    Additional info:

    Offending code appears to be checking for default (None) for the --max-child-cnt returning false for zero value i.e.

     if req_cnt and req_operator(real_value, req_cnt):
         status_code = 1
         print(
    ...
    

    may need to explicitly check for None?

     if req_cnt is not None and req_operator(real_value, req_cnt):
       status_code = 1
         print(
    ...
    
    bug 
    opened by PeteCorbettWS 0
  • `check-script-has-no-table-name` fails incorrectly due to `EXTRACT` function

    `check-script-has-no-table-name` fails incorrectly due to `EXTRACT` function

    Describe the bug Using an EXTRACT date function will recognize the column reference as a table reference.

    Check the script has not table name......................................Failed
    - hook id: check-script-has-no-table-name
    - exit code: 1
    
    models/example.sql: does not use source() or ref() macros for tables:
    - order_date
    

    To Reproduce Based on the jaffle shop dbt example, create a model with the following content:

    SELECT
        *,
        EXTRACT(YEAR FROM ORDER_DATE) as ORDER_YEAR
    FROM source('jaffle_shop', 'orders')
    

    Expected behavior Using an EXTRACT date function should not make the check-script-has-no-table-name fail.

    Version: v1.0.0

    Additional context Link to EXTRACT function documentation for different warehouses:

    bug 
    opened by nasseredine 0
  • Feature Request: `check-model-has-column`

    Feature Request: `check-model-has-column`

    Describe the feature you'd like Add a hook which asserts that a given column with a given type exists in a model.

    Additional context We generally require that every model has an audit timestamp column called _updated_at and it would be nice to be able to enforce that.

    enhancement 
    opened by huptonbirdsall 0
  • `Check model name contract` hook problem with version

    `Check model name contract` hook problem with version

    When using the following pre commit config file

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: v1.0.0
      hooks:
      - id: check-model-name-contract
        args: [--pattern, "(rep__).*"]
        files: models/reporting
    

    I get the following response [ERROR] check-model-name-contract is not present in repository https://github.com/offbi/pre-commit-dbt. Typo? Perhaps it is introduced in a newer version? Often pre-commit autoupdate fixes this.

    Even if I run the autoupdate command the error message is the same. Am I missing something?

    bug 
    opened by papost 2
Releases(v1.0.0)
Owner
Offbi
Data engineering with ❀️
Offbi
API to summarize input text

summaries API to summarize input text normal run $ docker-compose exec web python -m pytest disable warnings $ docker-compose exec web python -m pytes

Brad 1 Sep 08, 2021
TrainingBike - Code, models and schematics I've used to interface my stationary training bike with PC.

TrainingBike Code, models and schematics I've used to interface my stationary training bike with PC. You can find more information about the project i

1 Jan 01, 2022
pyinsim is a InSim module for the Python programming language.

PYINSIM pyinsim is a InSim module for the Python programming language. It creates socket connection with LFS and provides many classes, functions and

2 May 12, 2022
This library is an abstraction for Splunk-related development, maintenance, or migration operations

This library is an abstraction for Splunk-related development, maintenance, or migration operations. It provides a single CLI or SDK to conveniently perform various operations such as managing a loca

NEXTPART 6 Dec 21, 2022
Dot Browser is a privacy-conscious web browser with smarts built-in for protection against trackers and advertisments online.

🌍 Take back your privacy with Dot Browser, the privacy-conscious web browser that protects you from being tracked and monitored online.

Dot HQ 1k Jan 07, 2023
Controller state monitor plugin for EVA ICS

eva-plugin-cmon Controller status monitor plugin for EVA ICS Monitors connected controllers status in SFA and pushes measurements into an external Inf

Altertech 1 Nov 06, 2021
πŸ“œGenerate poetry with gcc diagnostics

gado (gcc awesome diagnostics orchestrator) is a wrapper of gcc that outputs its errors and warnings in a more poetic format.

Dikson Santos 19 Jun 25, 2022
A python API act as Control Center to control your Clevo Laptop via wmi on windows.

ClevoPyControlCenter A python API act as Control Center to control your Clevo Laptop via wmi on windows. Usage # pip3 install pymi from clevo_wmi impo

3 Sep 19, 2022
🌲 Um simples criador de arvore de items feito em Python para o Prompt 🐍

Esse projeto foi feito em Python com, intuito de fortificar meu aprendizado de programaΓ§Γ£o. Sobre β€’ Tecnologias β€’ PrΓ© Requisitos β€’ LicenΓ§a β€’ Autor πŸ“„

Kawan Henrique 1 Aug 02, 2021
A very terrible python-based programming language that uses folders instead of text files

PYFolders by Lewis L. Foster PYFolders is a very terrible python-based programming language that uses folders instead of regular text files. In this r

Lewis L. Foster 5 Jan 08, 2022
World's best free and open source ERP.

World's best free and open source ERP.

Frappe 12.5k Jan 07, 2023
SciPy library main repository

SciPy SciPy (pronounced "Sigh Pie") is an open-source software for mathematics, science, and engineering. It includes modules for statistics, optimiza

SciPy 10.7k Jan 09, 2023
Learning with Peter Norvig's lis.py interpreter

Learning with lis.py This repository contains variations of Peter Norvig's lis.py interpreter for a subset of Scheme, described in (How to Write a (Li

Fluent Python 170 Dec 15, 2022
Kolibri: the offline app for universal education

Kolibri This repository is for software developers wishing to contribute to Kolibri. If you are looking for help installing, configuring and using Kol

Learning Equality 564 Jan 02, 2023
Simple, configuration-driven backup software for servers and workstations

title permalink borgmatic index.html It's your data. Keep it that way. borgmatic is simple, configuration-driven backup software for servers and works

borgmatic collective 1.3k Dec 30, 2022
A Linux program to create a Windows USB stick installer from a real Windows DVD or image.

WoeUSB-ng A Linux program to create a Windows USB stick installer from a real Windows DVD or image. This package contains two programs: woeusb: A comm

Longinus 1 Nov 19, 2021
Let's pretend you want to create a AWS Lambda project called "sns-processor".

Usage Let's pretend you want to create a AWS Lambda project called "sns-processor". Rather than using lambda and then editing the results to include y

1 Dec 31, 2021
Online learning platform

πŸ›  Status: In Development Teached is currently in development. So we encourage you to use it and give us your feedback, but there are things that have

Mohamed Nesredin 2 Feb 07, 2021
Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

Retrying Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

Ray Holder 1.9k Dec 29, 2022
Tools for analyzing Java JVM gc log files

gc_log This package consists of two separate utilities useful for : gc_log_visualizer.py regionsize.py GC Log Visualizer This was updated to run under

Brad Schoening 0 Jan 04, 2022