Source files for the data lake demo video using the AWS TICKIT database

Overview

Data Lake Demo

Source code for video demonstration detailed in the post, Building a Simple Data Lake on AWS . Build a simple data lake on AWS using a combination of services, including Amazon MWAA, AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, and Amazon S3.

Architecture

Architecture

TICKIT Sample Database

Amazon Redshift TICKIT Sample Database

TICKIT Tables

  • tickit.saas.category
  • tickit.saas.event
  • tickit.saas.venue
  • tickit.crm.users
  • tickit.date
  • tickit.listing
  • tickit.sales

Naming Conventions

+-------------+--------------------------------------------------------------------+
| Prefix      | Description                                                        |
+-------------+--------------------------------------------------------------------+
| _source     | Data Source metadata only (org. call _raw in video)                |
| _raw        | Raw/Bronze data from data sources (org. call _converted in video)  |
| _refined    | Refined/Silver data - raw data with initial ELT/cleansing applied  |
| _aggregated | Gold/Aggregated data - aggregated/joined refined data              |
+-------------+--------------------------------------------------------------------+

AWS CLI Commands

There were two small changes made to the source code, as compared to the video demonstration, to help clarify the flow of data in the demonstration. The prefix for the (7) data source AWS Glue Data Catalog table’s prefix was switched from raw_ from source_. Also, the (7) Raw/Bronze AWS Glue Data Catalog table’s prefix was switched from converted_ to raw_. The final data flow is 1) source_, 2) raw_, 3) refined_, and 4) agg_ (aggregated).

DATA_LAKE_BUCKET="your-data-lake-bucket"

aws s3 rm "s3://${DATA_LAKE_BUCKET}/tickit/" --recursive

aws glue delete-database --name tickit_demo

aws glue create-database \
  --database-input '{"Name": "tickit_demo", "Description": "Track sales activity for the fictional TICKIT web site"}'

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws glue start-crawler --name tickit_postgresql
aws glue start-crawler --name tickit_mysql
aws glue start-crawler --name tickit_mssql

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --expression "source_*"  \
  --output table

aws glue start-job-run --job-name tickit_public_category_raw
aws glue start-job-run --job-name tickit_public_date_raw
aws glue start-job-run --job-name tickit_public_event_raw
aws glue start-job-run --job-name tickit_public_listing_raw
aws glue start-job-run --job-name tickit_public_sales_raw
aws glue start-job-run --job-name tickit_public_users_raw
aws glue start-job-run --job-name tickit_public_venue_raw

aws glue start-job-run --job-name tickit_public_category_refine
aws glue start-job-run --job-name tickit_public_date_refine
aws glue start-job-run --job-name tickit_public_event_refine
aws glue start-job-run --job-name tickit_public_listing_refine
aws glue start-job-run --job-name tickit_public_sales_refine
aws glue start-job-run --job-name tickit_public_users_refine
aws glue start-job-run --job-name tickit_public_venue_refine

aws glue get-tables \
  --database-name tickit_demo \
  --query "TableList[].Name" \
  --output table

aws s3api list-objects-v2 \
  --bucket ${DATA_LAKE_BUCKET} \
  --prefix "tickit/" \
  --query "Contents[].Key" \
  --output table
Owner
Gary A. Stafford
AWS Senior Solutions Architect | AWS Certified Professional | Cloud | Data | Containers | Serverless | DevOps | Polyglot Developer
Gary A. Stafford
Automatically logs into VTOP and can perform certain tasks

VTOP_Login Automatically logs into VTOP and can perform certain tasks To run the

Jatin 1 Jan 30, 2022
Python retagging utility for mkv files using mkvmerge.

pyretag A python script to retag mkv files. Setting Up pip install pyfiglet pip install rich Move the mkv files to input folder.

25 Dec 04, 2022
Autocut the Twitch VODs based on Marker

Markut Given the VOD of the stream and the markers that are exported as a CSV file, generate a video using ffmpeg that cuts out part of the VOD accord

Tsoding 18 Dec 19, 2022
A self-hosted streaming platform with Discord authentication, auto-recording and more!

A self-hosted streaming platform with Discord authentication, auto-recording and more!

John Patrick Glattetre 331 Dec 27, 2022
Stream music with ffmpeg and python

youtube-stream Stream music with ffmpeg and python original Usage set the KEY in stream.sh run server.py run stream.sh (You can use Git bash or WSL in

Giyoung Ryu 14 Nov 17, 2021
DICexport is a GUI (PyQt5) to export digital image correlation videos

DIC Video Exporter DICexport is a GUI (PyQt5) to export digital image correlation videos. It offers the flexibility to choose a selected range of a vi

Chaoyi Zhu 0 Jun 23, 2022
OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality video editing and animation solutions to the world.

OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality v

OpenShot Studios, LLC 3.1k Jan 01, 2023
Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.

Convert lecture videos to slides in one line. Takes an input of a directory containing your lecture videos and outputs a directory containing .PDF files containing the slides of each lecture.

Sidharth Anand 12 Sep 10, 2022
LL-HLS implementation written in Python3

biim mpegts stream to Apple Low Latency HLS Feature mpegts demuxing in pure python3 (using asyncio) mpegts stream to fragmented ts use piping from ffm

もにょ~ん 15 Jan 03, 2023
Python bindings for FFmpeg - with complex filtering support

ffmpeg-python: Python bindings for FFmpeg Overview There are tons of Python FFmpeg wrappers out there but they seem to lack complex filter support. ff

Karl Kroening 7.7k Jan 03, 2023
MoviePy is a Python library for video editing, can read and write all the most common audio and video formats

MoviePy is a Python library for video editing: cutting, concatenations, title insertions, video compositing (a.k.a. non-linear editing), video processing, and creation of custom effects. See the gall

10k Jan 08, 2023
Streams video from raspberry pi to desktop T1 - Recognizes Faces on client T2

VideoStreamingServer Completed: Streams video from raspberry pi to desktop T1 - Recognizes Faces on client T2 In progress: Change the transmission Pro

1 Dec 06, 2021
A simple Telegram bot to extract hard-coded subtitle from videos using FFmpeg & Tesseract.

Video Subtitle Extractor Bot A simple Telegram bot to extract hard-coded subtitle from videos using FFmpeg & Tesseract. Note that the accuracy of reco

14 Oct 28, 2022
Video-stream - A telegram video stream bot repo

This is a Telegram Video stream Bot. Binary Tech 💫 Features stream videos downl

silentz lk 1 Feb 02, 2022
A Advanced Anime Theme VC Video Player created for playing vidio in the voice chats of Telegram Groups

Yui Vidio Player A Advanced Anime Theme VC Video Player created for playing vidio in the voice chats of Telegram Groups Demo Setting up Add this Bot t

Achu biju 32 Sep 16, 2021
基于BililiveRecorder 的集群录播客户端

高度自动化的录播服务端! 一、项目介绍 1、介绍 这是NGlive的录播服务器集群的客户端部分实现代码,它可以自动化的进行录制-压制-上传-通知,同时流程高度可自定义,并且可以任意受中心服务器的调度,有一定的错误修复能力。可以保证长期稳定的运行。 2、基本功能 这个客户端集 录制、转码压制、上传为一

NGWORKS 7 Jul 10, 2022
A script to disable steam servers regionwise. [Works on Windows only]

Csgo-server-blocker A script to disable steam servers regionwise. [Works on Windows only] Dependencies python3.x Usage: pip install requirements.txt I

Aditya Bennur 2 Jun 10, 2022
Python program - to extract slides from videos

Programa em Python - que fiz em algumas horas e que provavelmente tem bugs - para extrair slides de vídeos.

Natanael Antonioli 45 Nov 12, 2022
A free project by a normal kamenrider fan

DEMONS DRIVER Python + OpenCV demons.py采集原视频中led灯珠颜色,并将结果输出到output文件夹 Arduino + WS2812B 基于FastLED 实现DEMONS驱动器的led面板效果 项目未完成,持续更新中 --------------------

2 Nov 14, 2022
Tiny python video cutter

tiny_python_video_cutter Source code based on a discussion in StackOverflow Setup project in Pycharm: Configure virtual env in Pycharm. You are done w

Truong 2 May 28, 2022