Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Assistant made in python to control your spotify via voice

Spotify-Assistant Assistant made in python to control your spotify via voice Overview 🚀 PLAY, PAUSE, NEXT, PREVIOUS, VOLUME COMMANDS 📝 Toast notific

Mauri 6 Jan 18, 2022
Discord.py Bot Series With Python

Discord.py Bot Series YouTube Playlist: https://www.youtube.com/playlist?list=PL9nZZVP3OGOAx2S75YdBkrIbVpiSL5oc5 Installation pip install -r requireme

Step 1 Dec 17, 2021
A Powerful telegram giveawayz bot based on the python-telegram-bot API

GiveawayZ Bot A Powerful telegram giveawayz bot based on the python-telegram-bot API. Powered by Team Zyntax and Team DFX Developed by @Zycho-Dev A pr

Zycho #AFK 5 Jul 31, 2022
MSE5050/7050 Materials Informatics course at the University of Utah

MaterialsInformatics MSE5050/7050 Materials Informatics course at the University of Utah This github repo contains coursework content such as class sl

41 Dec 30, 2022
Simple stock price analytics

mune · Mune is an open source python web application built to analyze stocks, named after Homma Munehisa. Currently, the forecasting component is powe

Richard Hong 14 Aug 30, 2021
Script que realiza a identificação de todos os logins e senhas dos wifis conectados em uma máquina e envia os dados para um e-mail especificado.

getWIFIConnection Script que realiza a identificação de todos os logins e senhas dos wifis conectados em uma máquina e envia os dados para um e-mail e

Vinícius Azevedo 3 Nov 27, 2022
Some 3Commas helper bots, AltRank, GalaxyScore, Watchlist, Auto-Compound

3Commas Cyber Bot Helpers A collection of 3Commas bot helpers I wrote. (collection will grow over time) Disclaimer THE SOFTWARE IS PROVIDED "AS IS", W

Ron Klinkien 176 Jan 02, 2023
Bot inspirado no Baidu Antivírus

Baidu Bot Bot inspirado no lendário Baidu Antivírus Informações O programa foi inteiramente feito em Python, sinta-se livre para fazer qualquer altera

Caio Eduardo de Albuquerque Magalhães 1 Dec 18, 2021
Halcyon is a Matrix bot library created with the intention of being easy to install and use. Inspired by discord.py

Halcyon is a Matrix bot library with the goal of being easy to install and use. The library takes inspiration from discord.py and the Slack li

Wes Ring 19 Jan 06, 2023
Seems Like Everyone Is Posting This, Thought I Should Too, Tokens Get Locked Upon Creation And Im Not Going To Fix For Several Reasons

Member-Booster Seems Like Everyone Is Posting This, Thought I Should Too, Tokens Get Locked Upon Creation And Im Not Going To Fix For Several Reasons

Mintyz 1 Dec 28, 2021
Automation for grabbing keys from a Linux host. Useful during red team exercises to quickly help assess what access to a Linux host can lead to.

keygrabber Automation for grabbing keys from a Linux host. This can be helpful during red team exercises when you gain access to a Linux host and want

Cedric Owens 14 Sep 27, 2022
Aula-API - a school system widely used in Denmark, as you can see and read about in the python file

Information : Hello, thank you for reading this first of all. This is a Aula-API

Binary.club 2 May 28, 2022
Gnosis-py includes a set of libraries to work with Ethereum and Gnosis projects

Gnosis-py Gnosis-py includes a set of libraries to work with Ethereum and Gnosis projects: EthereumClient, a wrapper over Web3.py Web3 client includin

Gnosis 93 Dec 23, 2022
Free and Open Source Group Voice chat music player for telegram ❤️ with button support youtube playback support

Free and Open Source Group Voice chat music player for telegram ❤️ with button support youtube playback support

Sehath Perera 1 Jan 08, 2022
An API wrapper around Discord API.

NeoCord This project is work in progress not for production use. An asynchronous API wrapper around Discord API written in Python. Features Modern API

Izhar Ahmad 14 Jan 03, 2022
Instagram story report with python

instagram-story-report Mass reports a victim stories. Made for fun, but can be used for chaos Single session and multi session support Login, choose a

Joshua Solo 8 May 08, 2022
RDMAss - A Python Discord bot creating an interaction with RDM API

RDMAss A Python Discord bot creating an interaction with RDM API. Features Assig

5 Sep 21, 2022
PlexAutoSkip - Automatically skip content in Plex

PlexAutoSkip Automatically skip tagged content in Plex A background python scrip

Michael Higgins 97 Dec 21, 2022
Fully automated Chegg Discord bot for "homework help"

cheggbog Fully automated Chegg Discord bot for "homework help". Working Sept 15, 2021 Overview Recently, Chegg has made it extremely difficult to auto

Bryce Hackel 8 Dec 23, 2022
A Discord webhook spammer made in Python.

A Python made Discord webhook spammer usually used for token loggers to spam them/delete them original by cattyn I only made it so u can change the avatar to whatever u want instead of it being hardc

notperry1234567890 15 Dec 15, 2021