Hierarchical Time Series Forecasting using Prophet

Overview

htsprophet

Hierarchical Time Series Forecasting using Prophet

Credit to Rob J. Hyndman and research partners as much of the code was developed with the help of their work.

https://www.otexts.org/fpp

https://robjhyndman.com/publications/

Credit to Facebook and their fbprophet package.

https://facebookincubator.github.io/prophet/

It was my intention to make some of the code look similar to certain sections in the Prophet and (Hyndman's) hts packages.

Downloading

  1. pip install htsprophet

If you'd like to just skip to coding with the package, runHTS.py should help you with that, but if you like reading, the following should help you understand how I built htsprophet and how it works.

Part I: The Data

I originally used Redfin traffic data to build this package.

I pulled the data so that date was in the first column, my layers were the middle columns, and the number I wanted to forecast was in the last column.

I made a function called makeWeekly() , that rolls up your data into the weekly level. It’s not a necessary function, it was mostly just convenient for me.

So the data looked like this:

Date Platform Medium BusinessMarket Sessions
1100 B.C. Stone Tablet Land Birmingham 23234
... Car Phone Air Auburn 2342
... Sea Evanston 233
... Seattle 445
... 46362

I then ran my orderHier() function with just this dataframe as its input.

NOTE: you cannot run this function if you have more than 4 columns in the middle (in between Date and Sessions for ex.)

To run this function, you specify the data, and how you want your middle columns to be ordered.

So orderHier(data, 2, 1, 3) means you want the second column after date to be the first level of the hierarchy.

Our example would look like this:

Alt text

Date Total Land Air Sea Land_Stone tablet Land_Car Phone Air_Stone tablet
1100 B.C. 24578 23135 555 888 23000 135 550
1099 B.C. 86753 86654 44 55 2342 84312 22
... ... ... ... ... ... ... ...
*All numbers represent the number of sessions for each node in the Hierarchy

If you have more than 4 categorical columns, then you must get the data in this format on your own while also producing the list of lists called nodes

Nodes – describes the structure of the hierarchy.

Here it would equal [[3],[2,2,2],[4,4,4,4,4,4]]

There are 3 nodes in the first level: Land, Air, Sea.

There are 2 children for each of those nodes: Stone tablet, Car phone.

There are 4 business markets for each of those nodes: Tokyo, Hamburg etc.

If you use the orderHier function, nodes will be the second output of the function.

Part II: Prophet Inputs

Anything that you would specify in Prophet you can specify in hts().

It’s flexible and will allow you to input a dataframe of values for inputs like cap, capF, and changepoints.

All of these inputs are specified when you call hts, and after that you just let it run.

The following is the description of inputs and outputs for hts as well as the specified defaults:

Parameters
----------------
 y - dataframe of time-series data
           Layout:
               0th Col - Time instances
               1st Col - Total of TS
               2nd Col - One of the children of the Total TS
               3rd Col - The other child of the Total TS
               ...
               ... Rest of the 1st layer
               ...
               Xth Col - First Child of the 2nd Col
               ...
               ... All of the 2nd Col's Children
               ...
               X+Yth Col - First Child of the 3rd Col
               ...
               ..
               .   And so on...

 h - number of step ahead forecasts to make (int)

 nodes - a list or list of lists of the number of child nodes at each level
 Ex. if the hierarchy is one total with two child nodes that comprise it, the nodes input would be [2]
 
 method – (String)  the type of hierarchical forecasting method that the user wants to use. 
            Options:
            "OLS" - optimal combination using ordinary least squares (Default), 
            "WLSS" - optimal combination using structurally weighted least squares, 
            "WLSV" - optimal combination using variance weighted least squares, 
            "FP" - forcasted proportions (top-down)
            "PHA" - proportions of historical averages (top-down)
            "AHP" - average historical proportions (top-down)
            "BU" - bottom-up (simple addition)
            "CVselect" - select which method is best for you based on 3-fold Cross validation (longer run time)
 
 freq - (Time Frequency) input for the forecasting function of Prophet 
 
 include_history - (Boolean) input for the forecasting function of Prophet
 
 transform - (None or "BoxCox") Do you want to transform your data before fitting the prophet function? If yes, type "BoxCox"
            
 cap - (Dataframe or Constant) carrying capacity of the input time series.  If it is a dataframe, then
                               the number of columns must equal len(y.columns) - 1
                               
 capF - (Dataframe or Constant) carrying capacity of the future time series.  If it is a dataframe, then
                                the number of columns must equal len(y.columns) - 1
 
 changepoints - (DataFrame or List) changepoints for the model to consider fitting. If it is a dataframe, then
                                    the number of columns must equal len(y.columns) - 1
 
 n_changepoints - (constant or list) changepoints for the model to consider fitting. If it is a list, then
                                     the number of items must equal len(y.columns) - 1
 skipFitting - (Boolean) if y is already a dictionary of dataframes, set this to True, and DO NOT run with method = "cvSelect" or transform = "BoxCox"
 
 numThreads - (int) number of threads you want to use when running cvSelect. Note: 14 has shown to decrease runtime by 10 percent 
 
 All other inputs - see Prophet
 
Returns
-----------------
 ynew - a dictionary of DataFrames with predictions, seasonalities and trends that can all be plotted

Don’t forget to specify the frequency if you’re not using daily data.

All other functions should be self-explanatory.

Part III: Room For Improvement

  1. Prediction intervals
Owner
Collin Rooney
Collin Rooney
Classification based on Fuzzy Logic(C-Means).

CMeans_fuzzy Classification based on Fuzzy Logic(C-Means). Table of Contents About The Project Fuzzy CMeans Algorithm Built With Getting Started Insta

Armin Zolfaghari Daryani 3 Feb 08, 2022
Flightfare-Prediction - It is a Flightfare Prediction Web Application Using Machine learning,Python and flask

Flight_fare-Prediction It is a Flight_fare Prediction Web Application Using Machine learning,Python and flask Using Machine leaning i have created a F

1 Dec 06, 2022
BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models.

Model Serving Made Easy BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models. Supports multi

BentoML 4.4k Jan 04, 2023
Customers Segmentation with RFM Scores and K-means

Customer Segmentation with RFM Scores and K-means RFM Segmentation table: K-Means Clustering: Business Problem Rule-based customer segmentation machin

5 Aug 10, 2022
Reggy - Regressions with arbitrarily complex regularization terms

reggy Regressions with arbitrarily complex regularization terms. Currently suppo

Kim 1 Jan 20, 2022
ML-powered Loan-Marketer Customer Filtering Engine

In Loan-Marketing business employees are required to call the user's to buy loans of several fields and in several magnitudes. If employees are calling everybody in the network it is also very length

Sagnik Roy 13 Jul 02, 2022
Winning solution for the Galaxy Challenge on Kaggle

Winning solution for the Galaxy Challenge on Kaggle

Sander Dieleman 483 Jan 02, 2023
Lseng-iseng eksplor Machine Learning dengan menggunakan library Scikit-Learn

Kalo dengar istilah ML, biasanya rada ambigu. Soalnya punya beberapa kepanjangan, seperti Mobile Legend, Makan Lontong, Ma**ng L*v* dan lain-lain. Tapi pada repo ini membahas Machine Learning :)

Alfiyanto Kondolele 1 Apr 06, 2022
Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Iris Species Predictor Iris species predictor app is used to classify iris species using their sepal length, sepal width, petal length and petal width

Siva Prakash 5 Apr 05, 2022
Machine Learning Model to predict the payment date of an invoice when it gets created in the system.

Payment-Date-Prediction Machine Learning Model to predict the payment date of an invoice when it gets created in the system.

15 Sep 09, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022
Solve automatic numerical differentiation problems in one or more variables.

numdifftools The numdifftools library is a suite of tools written in _Python to solve automatic numerical differentiation problems in one or more vari

Per A. Brodtkorb 181 Dec 16, 2022
Nevergrad - A gradient-free optimization platform

Nevergrad - A gradient-free optimization platform nevergrad is a Python 3.6+ library. It can be installed with: pip install nevergrad More installati

Meta Research 3.4k Jan 08, 2023
Decision tree is the most powerful and popular tool for classification and prediction

Diabetes Prediction Using Decision Tree Introduction Decision tree is the most powerful and popular tool for classification and prediction. A Decision

Arjun U 1 Jan 23, 2022
Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

Ibotta 282 Dec 09, 2022
The Emergence of Individuality

The Emergence of Individuality

16 Jul 20, 2022
An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

Shauli Ravfogel 4 Apr 13, 2022
Python module for machine learning time series:

seglearn Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extr

David Burns 536 Dec 29, 2022
Markov bot - A Writing bot based on Markov Chain for Data Structure Lab

基于马尔可夫链的写作机器人 前端 用html/css完成 Demo展示(已给出文本的相应展示) 用户提供相关的语料库后训练的成果 后端 要完成的几个接口 解析文

DysprosiumDy 9 May 05, 2022
All-in-one web-based development environment for machine learning

All-in-one web-based development environment for machine learning Getting Started • Features & Screenshots • Support • Report a Bug • FAQ • Known Issu

3 Feb 03, 2021