Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Univerity-student oriented (lithuanian) discord bot

Univerity-student oriented (lithuanian) discord bot

3 Nov 30, 2021
This program is an automated trading bot that uses TDAmeritrades Thinkorswim trading platform's scanners and alerts system.

Python Trading Bot w/ Thinkorswim Description This program is an automated trading bot that uses TDAmeritrades Thinkorswim trading platform's scanners

Trey Thomas 201 Jan 03, 2023
Remedy when Amazon ECR is not running basic scans for container CVEs.

Welcome to your CDK Python project! This is a blank project for Python development with CDK. The cdk.json file tells the CDK Toolkit how to execute yo

4n6ir 4 Nov 05, 2022
Zipper-s-Father - A simple telegram bot that takes a list of files sent by the user and returns them zipped

ZIP files telegram bot A simple telegram bot that takes a list of files sent by

Dr.Caduceus 1 Jan 29, 2022
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

ClearML - Auto-Magical Suite of tools to streamline your ML workflow Experiment Manager, MLOps and Data-Management ClearML Formerly known as Allegro T

ClearML 3.9k Jan 01, 2023
OGE-2022-na-Python - Solving problems in python for the OGE 2022

OGE-2022-na-Python Решение задачек на питоне для ОГЭ 2022 Тут разобраны разные в

Slava 0 Oct 14, 2022
A Telegram bot for personal utilities

Aqua Aqua is a Telegram bot for personal utilities. Installation Prerequisites: Install Poetry for managing dependencies and fork/clone the repository

Guilherme Vasconcelos 2 Mar 30, 2022
471 Dec 24, 2022
Prime Mega is a modular bot running on python3 with autobots theme and have a lot features.

PRIME MEGA Prime Mega is a modular bot running on python3 with autobots theme and have a lot features. Easiest Way To Deploy On Heroku This Bot is Cre

『TØNIC』 乂 ₭ILLΣR 45 Dec 15, 2022
🖥️ Windows Batch and powershell Discord Token grabber. Made for Troll (lmao)

Batched-Grabber Windows Batch and powershell Discord Token grabber. Made for Troll ! Setup. 1. pip(3) install numpy colored 2. python(3) Batched.py 3.

Ѵιcнч 41 Nov 01, 2022
An enhanced discord.py, based off of the now-archived discord.py project

enhanced-discord.py A modern, maintained, easy to use, feature-rich, and async ready API wrapper for Discord written in Python. The Future of enhanced

Devision 2 Dec 21, 2022
Termux Pkg

PKG Install Termux All Basic Pkg. Installation : pkg update && pkg upgrade && pkg install python && pkg install python2 && pkg install git && git clon

ɴᴏʙɪᴛᴀシ︎ 1 Oct 28, 2021
A quick way to verify your Climate Hack.AI (2022) submission locally!

Climate Hack.AI (2022) Submission Validator This repository contains code that allows you to quickly validate your Climate Hack.AI (2022) submission l

Jeremy 3 Mar 03, 2022
ETL python utilizando API do Spotify

Processo de ETL com Python e Airflow usando API do Spotify Sobre Projeto de ETL(Extract, Transform e Load) utilizando Python com API do Spotify e Airf

Leonardo 10 Mar 16, 2022
Telegram bot that let's you flip a coin in a dialog

coin_flip Telegram bot that let's you flip a coin in a dialog Report issue · Request feature About Software development tool that lets you finally dec

Ivan Akostelov 2 Dec 12, 2021
Automation application was made by me using Google, Sheet and Slack APIs with Python.

README This application is used to transfer the data in the xlsx document we have to the Google Drive environment and calculate the "total budget" wit

3 Apr 12, 2022
A Chip-8 emulator written using Python's default libraries

Chippure A Chip-8 emulator written using Python's default libraries. Instructions: Simply launch the .py file and type the name of the Chip8 ROM you w

5 Sep 27, 2022
A way to export your saved reddit posts to a Notion table.

reddit-saved-to-notion A way to export your saved reddit posts and comments to a Notion table.Uses notion-sdk-py and praw for interacting with Notion

19 Sep 12, 2022
A Python module for communicating with the Twilio API and generating TwiML.

twilio-python The default branch name for this repository has been changed to main as of 07/27/2020. Documentation The documentation for the Twilio AP

Twilio 1.6k Jan 05, 2023
Available slots checker for Spanish Passport

Bot that checks for available slots to make an appointment to issue the Spanish passport at the Uruguayan consulate page

1 Nov 30, 2021