Code Repository for The Kaggle Book, Published by Packt Publishing

Overview

The Kaggle Book

Data analysis and machine learning for competitive data science

Code Repository for The Kaggle Book, Published by Packt Publishing

"Luca and Konradˈs book helps make Kaggle even more accessible. They are both top-ranked users and well-respected members of the Kaggle community. Those who complete this book should expect to be able to engage confidently on Kaggle – and engaging confidently on Kaggle has many rewards." — Anthony Goldbloom, Kaggle Founder & CEO

Key Features

  • Learn how Kaggle works and how to make the most of competitions from two expert Kaggle Grandmasters
  • Sharpen your modeling skills with ensembling, feature engineering, adversarial validation, AutoML, transfer learning, and techniques for parameter tuning
  • Challenge yourself with problems regarding tabular data, vision, natural language as well as simulation and optimization
  • Discover tips, tricks, and best practices for getting great results on Kaggle and becoming a better data scientist
  • Read interviews with 31 Kaggle Masters and Grandmasters telling about their experience and tips

Get a step ahead of your competitors with a concise collection of smart data handling and modeling techniques

Getting started

You can run these notebooks on cloud platforms like Kaggle Colab or your local machine. Note that most chapters require a GPU even TPU sometimes to run in a reasonable amount of time, so we recommend one of the cloud platforms as they come pre-installed with CUDA.

Running on a cloud platform

To run these notebooks on a cloud platform, just click on one of the badges (Colab or Kaggle) in the table below. The code will be reproduced from Github directly onto the choosen platform (you may have to add the necessary data before running it). Alternatively, we also provide links to the fully working original notebook on Kaggle that you can copy and immediately run.

no Chapter Notebook Colab Kaggle
05 Competition Tasks and Metrics meta_kaggle Open In Colab Kaggle
06 Designing Good Validation adversarial-validation-example Open In Colab Kaggle
07 Modeling for Tabular Competitions interesting-eda-tsne-umap Open In Colab Kaggle
meta-features-and-target-encoding Open In Colab Kaggle
really-not-missing-at-random Open In Colab Kaggle
tutorial-feature-selection-with-boruta-shap Open In Colab Kaggle
08 Hyperparameter Optimization basic-optimization-practices Open In Colab Kaggle
hacking-bayesian-optimization-for-dnns Open In Colab Kaggle
hacking-bayesian-optimization Open In Colab Kaggle
kerastuner-for-imdb Open In Colab Kaggle
optuna-bayesian-optimization Open In Colab Kaggle
scikit-optimize-for-lightgbm Open In Colab Kaggle
tutorial-bayesian-optimization-with-lightgbm Open In Colab Kaggle
09 Ensembling with Blending and Stacking Solutions ensembling Open In Colab Kaggle
10 Modeling for Computer Vision augmentations-examples Open In Colab Kaggle
images-classification Open In Colab Kaggle
prepare-annotations Open In Colab Kaggle
segmentation-inference Open In Colab Kaggle
segmentation Open In Colab Kaggle
object-detection-yolov5 Open In Colab Kaggle
11 Modeling for NLP nlp-augmentations4 Open In Colab Kaggle
nlp-augmentation1 Open In Colab Kaggle
qanswering Open In Colab Kaggle
sentiment-extraction Open In Colab Kaggle
12 Simulation and Optimization Competitions connectx Open In Colab Kaggle
mab-santa Open In Colab Kaggle
rps-notebook1 Open In Colab Kaggle

Book Description

Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with the rest of the community, and gain valuable experience to help grow your career.

The first book of its kind, Data Analysis and Machine Learning with Kaggle assembles the techniques and skills you’ll need for success in competitions, data science projects, and beyond. Two masters of Kaggle walk you through modeling strategies you won’t easily find elsewhere, and the tacit knowledge they’ve accumulated along the way. As well as Kaggle-specific tips, you’ll learn more general techniques for approaching tasks based on image data, tabular data, textual data, and reinforcement learning. You’ll design better validation schemes and work more comfortably with different evaluation metrics.

Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.

What you will learn

  • Get acquainted with Kaggle and other competition platforms
  • Make the most of Kaggle Notebooks, Datasets, and Discussion forums
  • Understand different modeling tasks including binary and multi-class classification, object detection, NLP (Natural Language Processing), and time series
  • Design good validation schemes, learning about k-fold, probabilistic, and adversarial validation
  • Get to grips with evaluation metrics including MSE and its variants, precision and recall, IoU, mean average precision at k, as well as never-before-seen metrics
  • Handle simulation and optimization competitions on Kaggle
  • Create a portfolio of projects and ideas to get further in your career

Who This Book Is For

This book is suitable for Kaggle users and data analysts/scientists with at least a basic proficiency in data science topics and Python who are trying to do better in Kaggle competitions and secure jobs with tech giants. At the time of completion of this book, there are 96,190 Kaggle novices (users who have just registered on the website) and 67,666 Kaggle contributors (users who have just filled in their profile) enlisted in Kaggle competitions. This book has been written with all of them in mind and with anyone else wanting to break the ice and start taking part in competitions on Kaggle and learning from them.

Table of Contents

Part 1

  1. Introducing Kaggle and Other Data Science Competitions
  2. Organizing Data with Datasets
  3. Working and Learning with Kaggle Notebooks
  4. Leveraging Discussion Forums

Part 2

  1. Competition Tasks and Metrics
  2. Designing Good Validation
  3. Modeling for Tabular Competitions
  4. Hyperparameter Optimization
  5. Ensembling with Blending and Stacking Solutions
  6. Modeling for Computer Vision
  7. Modeling for NLP
  8. Simulation and Optimization Competitions

Part 3

  1. Creating Your Portfolio of Projects and Ideas
  2. Finding New Professional Opportunities
Owner
Packt
Providing books, eBooks, video tutorials, and articles for IT developers, administrators, and users.
Packt
Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Segformer - Pytorch Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch. Install $ pip install segformer-pytorch

Phil Wang 208 Dec 25, 2022
Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION This is the official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSU

电线杆 14 Dec 15, 2022
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.

ENet This work has been published in arXiv: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. Packages: train contains too

e-Lab 344 Nov 21, 2022
Learning cell communication from spatial graphs of cells

ncem Features Repository for the manuscript Fischer, D. S., Schaar, A. C. and Theis, F. Learning cell communication from spatial graphs of cells. 2021

Theis Lab 77 Dec 30, 2022
MakeItTalk: Speaker-Aware Talking-Head Animation

MakeItTalk: Speaker-Aware Talking-Head Animation This is the code repository implementing the paper: MakeItTalk: Speaker-Aware Talking-Head Animation

Adobe Research 285 Jan 08, 2023
Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

3 Jan 26, 2022
以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的斗地主ai

ddz-ai 介绍 斗地主是一种扑克游戏。游戏最少由3个玩家进行,用一副54张牌(连鬼牌),其中一方为地主,其余两家为另一方,双方对战,先出完牌的一方获胜。 ddz-ai以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的系统,使其经过大量训练后,能在实际游戏中获

freefuiiismyname 88 May 15, 2022
A Survey on Deep Learning Technique for Video Segmentation

A Survey on Deep Learning Technique for Video Segmentation A Survey on Deep Learning Technique for Video Segmentation Wenguan Wang, Tianfei Zhou, Fati

Tianfei Zhou 112 Dec 12, 2022
Vector Quantized Diffusion Model for Text-to-Image Synthesis

Vector Quantized Diffusion Model for Text-to-Image Synthesis Due to company policy, I have to set microsoft/VQ-Diffusion to private for now, so I prov

Shuyang Gu 294 Jan 05, 2023
Labels4Free: Unsupervised Segmentation using StyleGAN

Labels4Free: Unsupervised Segmentation using StyleGAN ICCV 2021 Figure: Some segmentation masks predicted by Labels4Free Framework on real and synthet

70 Dec 23, 2022
Code and data for ImageCoDe, a contextual vison-and-language benchmark

ImageCoDe This repository contains code and data for ImageCoDe: Image Retrieval from Contextual Descriptions. Data All collected descriptions for the

McGill NLP 27 Dec 02, 2022
A parallel framework for population-based multi-agent reinforcement learning.

MALib: A parallel framework for population-based multi-agent reinforcement learning MALib is a parallel framework of population-based learning nested

MARL @ SJTU 348 Jan 08, 2023
The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

PlantStereo This is the official implementation code for the paper "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction".

Wang Qingyu 14 Nov 28, 2022
TensorFlow implementation of "Attention is all you need (Transformer)"

[TensorFlow 2] Attention is all you need (Transformer) TensorFlow implementation of "Attention is all you need (Transformer)" Dataset The MNIST datase

YeongHyeon Park 4 Jan 05, 2022
Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination Pratul P. Srinivasan, Ben Mildenhall, Matthew Tancik, Jonathan T. Barron,

Pratul Srinivasan 65 Dec 14, 2022
[CVPR'21] Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration

Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration This repository contains the implementation of our paper Locally Aware Pi

sfwang 70 Dec 19, 2022
Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Clay Mullis 82 Oct 13, 2022
This repository holds code and data for our PETS'22 article 'From "Onion Not Found" to Guard Discovery'.

From "Onion Not Found" to Guard Discovery (PETS'22) This repository holds the code and data for our PETS'22 paper titled 'From "Onion Not Found" to Gu

Lennart Oldenburg 3 May 04, 2022
Jax/Flax implementation of Variational-DiffWave.

jax-variational-diffwave Jax/Flax implementation of Variational-DiffWave. (Zhifeng Kong et al., 2020, Diederik P. Kingma et al., 2021.) DiffWave with

YoungJoong Kim 37 Dec 16, 2022
Neural network for recognizing the gender of people in photos

Neural Network For Gender Recognition How to test it? Install requirements.txt file using pip install -r requirements.txt command Run nn.py using pyth

Valery Chapman 1 Sep 18, 2022