The code from the Machine Learning Bookcamp book and a free course based on the book

Overview

Machine Learning Bookcamp

The code from the Machine Learning Bookcamp book

Useful links:

Machine Learning Zoomcamp

Machine Learning Zoomcamp is a course based on the book

  • It's online and free
  • You can join at any moment
  • More information in the course-zoomcamp folder

Reading Plan

Chapters

Chapter 1: Introduction to Machine Learning

  • Understanding machine learning and the problems it can solve
  • CRISP-DM: Organizing a successful machine learning project
  • Training and selecting machine learning models
  • Performing model validation

No code

Chapter 2: Machine Learning for Regression

  • Creating a car-price prediction project with a linear regression model
  • Doing an initial exploratory data analysis with Jupyter notebooks
  • Setting up a validation framework
  • Implementing the linear regression model from scratch
  • Performing simple feature engineering for the model
  • Keeping the model under control with regularization
  • Using the model to predict car prices

Code: chapter-02-car-price/02-carprice.ipynb

Chapter 3: Machine Learning for Classification

  • Predicting customers who will churn with logistic regression
  • Doing exploratory data analysis for identifying important features
  • Encoding categorical variables to use them in machine learning models
  • Using logistic regression for classification

Code: chapter-03-churn-prediction/03-churn.ipynb

Chapter 4: Evaluation Metrics for Classification

  • Accuracy as a way of evaluating binary classification models and its limitations
  • Determining where our model makes mistakes using a confusion table
  • Deriving other metrics like precision and recall from the confusion table
  • Using ROC and AUC to further understand the performance of a binary classification model
  • Cross-validating a model to make sure it behaves optimally
  • Tuning the parameters of a model to achieve the best predictive performance

Code: chapter-03-churn-prediction/04-metrics.ipynb

Chapter 5: Deploying Machine Learning Models

  • Saving models with Pickle
  • Serving models with Flask
  • Managing dependencies with Pipenv
  • Making the service self-contained with Docker
  • Deploying it to the cloud using AWS Elastic Beanstalk

Code: chapter-05-deployment

Chapter 6: Decision Trees and Ensemble Learning

  • Predicting the risk of default with tree-based models
  • Decision trees and the decision tree learning algorithm
  • Random forest: putting multiple trees together into one model
  • Gradient boosting as an alternative way of combining decision trees

Code: chapter-06-trees/06-trees.ipynb

Chapter 7: Neural Networks and Deep Learning

  • Convolutional neural networks for image classification
  • TensorFlow and Keras — frameworks for building neural networks
  • Using pre-trained neural networks
  • Internals of a convolutional neural network
  • Training a model with transfer learning
  • Data augmentations — the process of generating more training data

Code: chapter-07-neural-nets/07-neural-nets-train.ipynb

Chapter 8: Serverless Deep Learning

  • Serving models with TensorFlow-Lite — a light-weight environment for applying TensorFlow models
  • Deploying deep learning models with AWS Lambda
  • Exposing the Lambda function as a web service via API Gateway

Code: chapter-08-serverless

Chapter 9: Kubernetes and Kubeflow

Kubernetes:

  • Understanding different methods of deploying and serving models in the cloud.
  • Serving Keras and TensorFlow models with TensorFlow-Serving
  • Deploying TensorFlow-Serving to Kubernetes

Code: chapter-09-kubernetes

Kubeflow:

  • Using Kubeflow and KFServing for simplifying the deployment process

Code: chapter-09-kubeflow

Articles from mlbookcamp.com:

Appendices

Appendix A: Setting up the Environment

  • Installing Anaconda, a Python distribution that includes most of the scientific libraries we need
  • Running a Jupyter Notebook service from a remote machine
  • Installing and configuring the Kaggle command line interface tool for accessing datasets from Kaggle
  • Creating an EC2 machine on AWS using the web interface and the command-line interface

Code: no code

Articles from mlbookcamp.com:

Appendix B: Introduction to Python

  • Basic python syntax: variables and control-flow structures
  • Collections: lists, tuples, sets, and dictionaries
  • List comprehensions: a concise way of operating on collections
  • Reusability: functions, classes and importing code
  • Package management: using pip for installing libraries
  • Running python scripts

Code: appendix-b-python.ipynb

Articles from mlbookcamp.com:

Appendix C: Introduction to NumPy and Linear Algebra

  • One-dimensional and two-dimensional NumPy arrays
  • Generating NumPy arrays randomly
  • Operations with NumPy arrays: element-wise operations, summarizing operations, sorting and filtering
  • Multiplication in linear algebra: vector-vector, matrix-vector and matrix-matrix multiplications
  • Finding the inverse of a matrix and solving the normal equation

Code: appendix-c-numpy.ipynb

Articles from mlbookcamp.com:

Appendix C: Introduction to Pandas

  • The main data structures in Pandas: DataFrame and Series
  • Accessing rows and columns of a DataFrame
  • Element-wise and summarizing operations
  • Working with missing values
  • Sorting and grouping

Code: appendix-d-pandas.ipynb

Appendix D: AWS SageMaker

  • Increasing the GPU quota limits
  • Renting a Jupyter notebook with GPU in AWS SageMaker
You might also like...
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Examples and code for the Practical Machine Learning workshop series

Practical Machine Learning Workshop Series Practical Machine Learning for Quantitative Finance Post conference workshop at the WBS Spring Conference D

100 Days of Machine and Deep Learning Code

💯 Days of Machine Learning and Deep Learning Code MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Cluste

Turns your machine learning code into microservices with web API, interactive GUI, and more.
Turns your machine learning code into microservices with web API, interactive GUI, and more.

Turns your machine learning code into microservices with web API, interactive GUI, and more.

TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

A powerful and flexible machine learning platform for drug discovery

Machine learning template for projects based on sklearn library.

Machine learning template for projects based on sklearn library.

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Comments
  • Adding setup with docker

    Adding setup with docker

    Hi @alexeygrigorev ,

    I created a small guide for anyone who feels comfortable using Docker or might want to try it for setting up the environment.

    Since I saw a couple of questions today related to environment setup, I thought of sharing what I usually use when working on projects or courses, then it can be re-usable.

    Hoping is helpful :)

    Changelog:

    • Updated readme with link to guide to create docker container
    • Added new guide to build docker container and run it
    • Added Dockerfile and environment.yml
    opened by laurauzcategui 5
  • While converting keras to tflite error

    While converting keras to tflite error

    While converting keras to tflite error :

    raise ValueError('Unrecognized keyword arguments:', kwargs.keys()) ValueError: ('Unrecognized keyword arguments:', dict_keys(['ragged']))

    Traceback (most recent call last): File "convert.py", line 5, in <module> model = keras.models.load_model('xception_v4_large_08_0.894.h5')

    opened by saisubramani 5
  • notes correction in 06 Decision Trees...

    notes correction in 06 Decision Trees...

    Inside 02-data-prep.md , in the train/val/test split bullet note at the moment is : "Split the data with the distribution of 80% train, 20% validation, and 20% test sets with random seed to 11"

    should be:

    Split the data with the distribution of 60% train, 20% validation, and 20% test sets with random seed to 11

    opened by lucapug 4
  • Update homework.md

    Update homework.md

    Updated Question 4 text from "when one grows" to "when one grows up" and the F1 formula from "F1 = 2 * P * R / (P + R)" to "$$F1 = {2.}\frac{P . R}{P+R}$$"

    opened by ukokobili 3
Releases(chapter7-model)
Owner
Alexey Grigorev
Alexey Grigorev
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 03, 2022
A Time Series Library for Apache Spark

Flint: A Time Series Library for Apache Spark The ability to analyze time series data at scale is critical for the success of finance and IoT applicat

Two Sigma 970 Jan 04, 2023
List of Data Science Cheatsheets to rule the world

Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat

Favio André Vázquez 11.7k Dec 30, 2022
Flightfare-Prediction - It is a Flightfare Prediction Web Application Using Machine learning,Python and flask

Flight_fare-Prediction It is a Flight_fare Prediction Web Application Using Machine learning,Python and flask Using Machine leaning i have created a F

1 Dec 06, 2022
A simple python program which predicts the success of a movie based on it's type, actor, actress and director

Movie-Success-Prediction A simple python program which predicts the success of a movie based on it's type, actor, actress and director. The program us

Mahalinga Prasad R N 1 Dec 17, 2021
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021
Machine Learning from Scratch

Machine Learning from Scratch Author: Shengxuan Wang From: Oregon State University Content: Building Machine Learning model from Scratch, without usin

ShawnWang 0 Jul 05, 2022
Turning images into '9-pan' palettes using KMeans clustering from sklearn.

img2palette Turning images into '9-pan' palettes using KMeans clustering from sklearn. Requirements We require: Pillow, for opening and processing ima

Samuel Vidovich 2 Jan 01, 2022
fMRIprep Pipeline To Machine Learning

fMRIprep Pipeline To Machine Learning(Demo) 所有配置均在config.py文件下定义 前置环境(lilab) 各个节点均安装docker,并有fmripre的镜像 可以使用conda中的base环境(相应的第三份包之后更新) 1. fmriprep scr

Alien 3 Mar 08, 2022
MLflow App Using React, Hooks, RabbitMQ, FastAPI Server, Celery, Microservices

Katana ML Skipper This is a simple and flexible ML workflow engine. It helps to orchestrate events across a set of microservices and create executable

Tom Xu 8 Nov 17, 2022
SPCL 48 Dec 12, 2022
Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogramas anuais com spark, em pyspark e SQL!

Olá! Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogr

Henrique de Paula 10 Apr 04, 2022
Dive into Machine Learning

Dive into Machine Learning Hi there! You might find this guide helpful if: You know Python or you're learning it 🐍 You're new to Machine Learning You

Michael Floering 11.1k Jan 03, 2023
Confidence intervals for scikit-learn forest algorithms

forest-confidence-interval: Confidence intervals for Forest algorithms Forest algorithms are powerful ensemble methods for classification and regressi

272 Dec 01, 2022
An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

Shauli Ravfogel 4 Apr 13, 2022
Python Research Framework

Python Research Framework

EleutherAI 106 Dec 13, 2022
TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

A powerful and flexible machine learning platform for drug discovery

MilaGraph 1.1k Jan 08, 2023
Python package for concise, transparent, and accurate predictive modeling

Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. 📚 docs • 📖 demo notebooks Modern

Chandan Singh 983 Jan 01, 2023
Distributed deep learning on Hadoop and Spark clusters.

Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version

Yahoo 1.3k Dec 28, 2022
cleanlab is the data-centric ML ops package for machine learning with noisy labels.

cleanlab is the data-centric ML ops package for machine learning with noisy labels. cleanlab cleans labels and supports finding, quantifying, and lear

Cleanlab 51 Nov 28, 2022