A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.

Overview

Feature Engineering & Feature Selection

A comprehensive guide [pdf] [markdown] for Feature Engineering and Feature Selection, with implementations and examples in Python.

Motivation

Feature Engineering & Selection is the most essential part of building a useable machine learning project, even though hundreds of cutting-edge machine learning algorithms coming in these days like deep learning and transfer learning. Indeed, like what Prof Domingos, the author of  'The Master Algorithm' says:

“At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.”

— Prof. Pedro Domingos

001

Data and feature has the most impact on a ML project and sets the limit of how well we can do, while models and algorithms are just approaching that limit. However, few materials could be found that systematically introduce the art of feature engineering, and even fewer could explain the rationale behind. This repo is my personal notes from learning ML and serves as a reference for Feature Engineering & Selection.

Download

Download the PDF here:

Same, but in markdown:

PDF has a much readable format, while Markdown has auto-generated anchor link to navigate from outer source. GitHub sucks at displaying markdown with complex grammar, so I would suggest read the PDF or download the repo and read markdown with Typora.

What You'll Learn

Not only a collection of hands-on functions, but also explanation on Why, How and When to adopt Which techniques of feature engineering in data mining.

  • the nature and risk of data problem we often encounter
  • explanation of the various feature engineering & selection techniques
  • rationale to use it
  • pros & cons of each method
  • code & example

Getting Started

This repo is mainly used as a reference for anyone who are doing feature engineering, and most of the modules are implemented through scikit-learn or its communities.

To run the demos or use the customized function, please download the ZIP file from the repo or just copy-paste any part of the code you find helpful. They should all be very easy to understand.

Required Dependencies:

  • Python 3.5, 3.6 or 3.7
  • numpy>=1.15
  • pandas>=0.23
  • scipy>=1.1.0
  • scikit_learn>=0.20.1
  • seaborn>=0.9.0

Table of Contents and Code Examples

Below is a list of methods currently implemented in the repo.

1. Data Exploration

2. Feature Cleaning

3. Feature Engineering

4. Feature Selection

Key Links and Resources

  • Udemy's Feature Engineering online course

https://www.udemy.com/feature-engineering-for-machine-learning/

  • Udemy's Feature Selection online course

https://www.udemy.com/feature-selection-for-machine-learning

  • JMLR Special Issue on Variable and Feature Selection

http://jmlr.org/papers/special/feature03.html

  • Data Analysis Using Regression and Multilevel/Hierarchical Models, Chapter 25: Missing data

http://www.stat.columbia.edu/~gelman/arm/missing.pdf

  • Data mining and the impact of missing data

http://core.ecu.edu/omgt/krosj/IMDSDataMining2003.pdf

  • PyOD: A Python Toolkit for Scalable Outlier Detection

https://github.com/yzhao062/pyod

  • Weight of Evidence (WoE) Introductory Overview

http://documentation.statsoft.com/StatisticaHelp.aspx?path=WeightofEvidence/WeightofEvidenceWoEIntroductoryOverview

  • About Feature Scaling and Normalization

http://sebastianraschka.com/Articles/2014_about_feature_scaling.html

  • Feature Generation with RF, GBDT and Xgboost

https://blog.csdn.net/anshuai_aw1/article/details/82983997

  • A review of feature selection methods with applications

https://ieeexplore.ieee.org/iel7/7153596/7160221/07160458.pdf

Owner
Yimeng.Zhang
I'm a lovely machine learning learner~
Yimeng.Zhang
NES development tool made with Python and Lua

NES Builder NES development and romhacking tool made with Python and Lua Current Stage: Alpha Features Open source "Build" project, which exports vari

10 Aug 19, 2022
UdemyPy is a bot that hourly looks for Udemy free courses and post them in my Telegram Channel: Free Courses.

UdemyPy UdemyPy is a bot that hourly looks for Udemy free courses and post them in my Telegram Channel: Free Courses. How does it work? For publishing

88 Dec 25, 2022
Personal Chat Assistance

Python-Programming Personal Chat Assistance {% import "bootstrap/wtf.html" as wtf %} titleEVT/title script src="https://code.jquery.com/jquery-3.

PRASH_SMVIT 2 Nov 14, 2021
A inspector to be able to view and edit Qt style sheet while an application is running

Qt Style Sheet Inspector An inspector widget to view and modify the style sheet of a Qt app at runtime. Usage In order to use the inspector widget on

ESSS 46 Dec 10, 2022
Daily knowledge pills to get better in Python.

Python daily pills Daily knowledge pills to get better Python code. Why Does your Python code suffers of any of this symptoms? Incorrect Indentation I

Jeferson Vaz dos Santos 35 Sep 19, 2022
Open-source library for analyzing the results produced by ABINIT

Package Continuous Integration Documentation About AbiPy is a python library to analyze the results produced by Abinit, an open-source program for the

ABINIT 91 Dec 09, 2022
Python script that automates the tasks involved in starting a new coding project

Auto Project Builder Automates the repetitive tasks while starting a new project Installation Use the REQUIREMENTS.txt file to install the dependencie

Prathap S S 1 Feb 03, 2022
This is a fork of the BakeTool with some improvements that I did to have better workflow.

blender-bake-tool This is a fork of the BakeTool with some improvements that I did to have better workflow. 99.99% of work was done by BakeTool team.

Acvarium 3 Oct 04, 2022
Push Prometheus metrics to VictoriaMetrics or other exporters

Push metrics from your periodic long-running jobs to existing Prometheus/VictoriaMetrics monitoring system.

olegm 14 Nov 04, 2022
Telop - Encode and decode messages using an interpretation of the telegraphic code devised by José María Mathé

telop Telop (TELégrafoÓPtico) - Utilidad para codificar y descodificar mensajes de texto empleando una interpretación del código telegráfico ideado po

Ricardo F. 4 Nov 01, 2022
Protocol Buffers for the Rest of Us

Protocol Buffers for the Rest of Us Motivation protoletariat has one goal: fixing the broken imports for the Python code generated by protoc. Usage He

Phillip Cloud 76 Jan 04, 2023
YourCity is a platform to match people to their prefect city.

YourCity YourCity is a city matching App that matches users to their ideal city. It is a fullstack React App made with a Redux state manager and a bac

Nico G Pierson 6 Sep 25, 2021
Tracking development of the Class Schedule Siri Shortcut, an iOS program that checks the type of school day and tells you class scheduling.

Class Schedule Shortcut Tracking development of the Class Schedule Siri Shortcut, an iOS program that checks the type of school day and tells you clas

3 Jun 28, 2022
A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python

A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python

Windel Bouwman 277 Dec 26, 2022
navigation_commander is a ROS package to command the robot to navigate autonomously to each table for food delivery inside a hotel.

navigation_commander navigation_commander is a ROS package to command the robot to navigate autonomously to each table for food delivery inside a hote

ALEENA LENTIN 9 Nov 08, 2021
Eros is an expiremental programming language built using simple Python code.

Eros is an expiremental programming language built using simple Python code. Featuring an easy syntax and unique features like type slicing, the language remains an expirement that grows in down time

zxro 2 Nov 21, 2021
Lectures for Udemy - Complete Python Bootcamp Course

Complete-Python-Bootcamp Welcome to the Repository for the Complete Python Bootcamp! This is the Repository for the Udemy course - "Complete Python Bo

Marci 2k Dec 28, 2022
SymbLang are my programming language! Insired by the brainf**k.

SymbLang . - output as Unicode. , - input. ; - clear data. & - character that the main line start with. @value: 0 - 9 - character that the function

1 Apr 04, 2022
🍬️🦇️ Open source Trick or Treat! 🦇️🍬️

Open Source Halloween! What's an easy way to have fun, and celebrate an open source Halloween? Open source trick or treating, of course! The repositor

Research Software Engineers 3 Oct 18, 2021
Script to produce `.tex` files of example GAP sessions

Introduction The main file GapToTex.py in this directory is used to produce .tex files of example GAP sessions. Instructions Run python GapToTex.py [G

Friedrich Rober 2 Oct 06, 2022