AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Overview

Data Science on AWS - O'Reilly Book

Open In SageMaker Studio Lab

Get the book on Amazon.com

Data Science on AWS

Book Outline

Book Outline

Quick Start Workshop (4-hours)

Workshop Paths

In this quick start hands-on workshop, you will build an end-to-end AI/ML pipeline for natural language processing with Amazon SageMaker. You will train and tune a text classifier to predict the star rating (1 is bad, 5 is good) for product reviews using the state-of-the-art BERT model for language representation. To build our BERT-based NLP text classifier, you will use a product reviews dataset where each record contains some review text and a star rating (1-5).

Quick Start Workshop Learning Objectives

Attendees will learn how to do the following:

  • Ingest data into S3 using Amazon Athena and the Parquet data format
  • Visualize data with pandas, matplotlib on SageMaker notebooks
  • Detect statistical data bias with SageMaker Clarify
  • Perform feature engineering on a raw dataset using Scikit-Learn and SageMaker Processing Jobs
  • Store and share features using SageMaker Feature Store
  • Train and evaluate a custom BERT model using TensorFlow, Keras, and SageMaker Training Jobs
  • Evaluate the model using SageMaker Processing Jobs
  • Track model artifacts using Amazon SageMaker ML Lineage Tracking
  • Run model bias and explainability analysis with SageMaker Clarify
  • Register and version models using SageMaker Model Registry
  • Deploy a model to a REST endpoint using SageMaker Hosting and SageMaker Endpoints
  • Automate ML workflow steps by building end-to-end model pipelines using SageMaker Pipelines

Extended Workshop (8-hours)

Workshop Paths

In the extended hands-on workshop, you will get hands-on with advanced model training and deployment techniques such as hyper-parameter tuning, A/B testing, and auto-scaling. You will also setup a real-time, streaming analytics and data science pipeline to perform window-based aggregations and anomaly detection.

Extended Workshop Learning Objectives

Attendees will learn how to do the following:

  • Perform automated machine learning (AutoML) to find the best model from just your dataset with low-code
  • Find the best hyper-parameters for your custom model using SageMaker Hyper-parameter Tuning Jobs
  • Deploy multiple model variants into a live, production A/B test to compare online performance, live-shift prediction traffic, and autoscale the winning variant using SageMaker Hosting and SageMaker Endpoints
  • Setup a streaming analytics and continuous machine learning application using Amazon Kinesis and SageMaker

Workshop Instructions

Open In SageMaker Studio Lab

Amazon SageMaker Studio Lab is a free service that enables anyone to learn and experiment with ML without needing an AWS account, credit card, or cloud configuration knowledge.

1. Request Amazon SageMaker Studio Lab Account

Go to Amazon SageMaker Studio Lab, and request a free acount by providing a valid email address.

Amazon SageMaker Studio Lab Amazon SageMaker Studio Lab - Request Account

Note that Amazon SageMaker Studio Lab is currently in public preview. The number of new account registrations will be limited to ensure a high quality of experience for all customers.

2. Create Studio Lab Account

When your account request is approved, you will receive an email with a link to the Studio Lab account registration page.

You can now create your account with your approved email address and set a password and your username. This account is separate from an AWS account and doesn't require you to provide any billing information.

Amazon SageMaker Studio Lab - Create Account

3. Sign in to your Studio Lab Account

You are now ready to sign in to your account.

Amazon SageMaker Studio Lab - Sign In

4. Select your Compute instance, Start runtime, and Open project

CPU Option

Select CPU as the compute type and click Start runtime.

Amazon SageMaker Studio Lab - CPU

Once the Status shows Running, click Open project

Amazon SageMaker Studio Lab - GPU Running

5. Launch a New Terminal within Studio Lab

Amazon SageMaker Studio Lab - New Terminal

6. Clone this GitHub Repo in the Terminal

Within the Terminal, run the following:

cd ~ && git clone https://github.com/data-science-on-aws/oreilly_book

Amazon SageMaker Studio Lab - Clone Repo

7. Create data_science_on_aws Conda kernel

Within the Terminal, run the following:

cd ~/oreilly_book/ && conda env create -f environment.yml || conda env update -f environment.yml && conda activate data_science_on_aws

Amazon SageMaker Studio Lab - Create Kernel

If you see an error like the following, just ignore it. This will appear if you already have an existing Conda environment with this name. In this case, we will update the environment.

CondaValueError: prefix already exists: /home/studio-lab-user/.conda/envs/data_science_on_aws

8. Start the Workshop!

Navigate to oreilly_book/00_quickstart/ in SageMaker Studio Lab and start the workshop!

You may need to refresh your browser if you don't see the new oreilly_book/ directory.

Amazon SageMaker Studio Lab - Start Workshop

When you open the notebooks, make sure to select the data_science_on_aws kernel.

Amazon SageMaker Studio Lab - Select Kernel

Owner
Data Science on AWS
Data Science on AWS
Data Science on AWS
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
Both social media sentiment and stock market data are crucial for stock price prediction

Relating-Social-Media-to-Stock-Movement-Public - We explore the application of Machine Learning for predicting the return of the stock by using the information of stock returns. A trading strategy ba

Vishal Singh Parmar 15 Oct 29, 2022
To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Astitva Veer Garg 1 Jan 11, 2022
XManager: A framework for managing machine learning experiments 🧑‍🔬

XManager is a platform for packaging, running and keeping track of machine learning experiments. It currently enables one to launch experiments locally or on Google Cloud Platform (GCP). Interaction

DeepMind 620 Dec 27, 2022
Customers Segmentation with RFM Scores and K-means

Customer Segmentation with RFM Scores and K-means RFM Segmentation table: K-Means Clustering: Business Problem Rule-based customer segmentation machin

5 Aug 10, 2022
Machine Learning e Data Science com Python

Machine Learning e Data Science com Python Arquivos do curso de Data Science e Machine Learning com Python na Udemy, cliqe aqui para acessá-lo. O prin

Renan Barbosa 1 Jan 27, 2022
Machine learning that just works, for effortless production applications

Machine learning that just works, for effortless production applications

Elisha Yadgaran 16 Sep 02, 2022
A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

The Alan Turing Institute 6k Jan 06, 2023
A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

17 Aug 14, 2022
💀mummify: a version control tool for machine learning

mummify is a version control tool for machine learning. It's simple, fast, and designed for model prototyping.

Max Humber 43 Jul 09, 2022
My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data

kNN-vs-RFR My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data In many areas, rental bikes have been launched to

1 Oct 28, 2021
OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference S

OptaPy 208 Dec 27, 2022
Mortality risk prediction for COVID-19 patients using XGBoost models

Mortality risk prediction for COVID-19 patients using XGBoost models Using demographic and lab test data received from the HM Hospitales in Spain, I b

1 Jan 19, 2022
Laporan Proyek Machine Learning - Azhar Rizki Zulma

Laporan Proyek Machine Learning - Azhar Rizki Zulma Project Overview Domain proyek yang dipilih dalam proyek machine learning ini adalah mengenai hibu

Azhar Rizki Zulma 6 Mar 12, 2022
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

2.5k Dec 28, 2022
A naive Bayes model for cancer classification using a set of documents

Naivebayes text classifcation model for cancer and noncancer documents Author: Alex King Purpose Requirements/files included How to use 1. Purpose The

Alex W King 1 Nov 24, 2021
Katana project is a template for ASAP 🚀 ML application deployment

Katana project is a FastAPI template for ASAP 🚀 ML API deployment

Mohammad Shahebaz 100 Dec 26, 2022
Stacked Generalization (Ensemble Learning)

Stacking (stacked generalization) Overview ikki407/stacking - Simple and useful stacking library, written in Python. User can use models of scikit-lea

Ikki Tanaka 192 Dec 23, 2022
A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

Nicholas Monath 31 Nov 03, 2022
MiniTorch - a diy teaching library for machine learning engineers

This repo is the full student code for minitorch. It is designed as a single repo that can be completed part by part following the guide book. It uses

1.1k Jan 07, 2023