Nixtla is an open-source time series forecasting library.

Overview

Nixtla

Nixtla is an open-source time series forecasting library.

We are helping data scientists and developers to have access to open source state-of-the-art forecasting pipelines. For that purpose, we built a complete pipeline that can be deployed in the cloud using AWS and consumed via APIs or consumed as a service. If you want to set up your own infrastructure, follow the instructions in the repository (Azure coming soon).

You can use our fully hosted version as a service through our python SDK (autotimeseries). To consume the APIs on our own infrastructure just request tokens by sending an email to [email protected] or opening a GitHub issue. We currently have free resources available for anyone interested.

We built a fully open-source time-series pipeline capable of achieving 1% of the performance in the M5 competition. Our open-source solution has a 25% better accuracy than Amazon Forecast and is 20% more accurate than fbprophet. It also performs 4x faster than Amazon Forecast and is less expensive.

To reproduce the results: Open In Colab or you can read this Medium Post.

At Nixtla we strongly believe in open-source, so we have released all the necessary code to set up your own time-series processing service in the cloud (using AWS, Azure is WIP). This repository uses continuous integration and deployment to deploy the APIs on our infrastructure.

Python SDK Basic Usage

CI python sdk

Install

PyPI

pip install autotimeseries

How to use

Check the following examples for a full pipeline:

Basic usage

import os

from autotimeseries.core import AutoTS

autotimeseries = AutoTS(bucket_name=os.environ['BUCKET_NAME'],
                        api_id=os.environ['API_ID'],
                        api_key=os.environ['API_KEY'],
                        aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
                        aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'])

Upload dataset to S3

train_dir = '../data/m5/parquet/train'
# File with target variables
filename_target = autotimeseries.upload_to_s3(f'{train_dir}/target.parquet')
# File with static variables
filename_static = autotimeseries.upload_to_s3(f'{train_dir}/static.parquet')
# File with temporal variables
filename_temporal = autotimeseries.upload_to_s3(f'{train_dir}/temporal.parquet')

Each time series of the uploaded datasets is defined by the column item_id. Meanwhile the time column is defined by timestamp and the target column by demand. We need to pass this arguments to each call.

columns = dict(unique_id_column='item_id',
               ds_column='timestamp',
               y_column='demand')

Send the job to make forecasts

response_forecast = autotimeseries.tsforecast(filename_target=filename_target,
                                              freq='D',
                                              horizon=28,
                                              filename_static=filename_static,
                                              filename_temporal=filename_temporal,
                                              objective='tweedie',
                                              metric='rmse',
                                              n_estimators=170,
                                              **columns)

Download forecasts

autotimeseries.download_from_s3(filename='forecasts_2021-10-12_19-04-32.csv', filename_output='../data/forecasts.csv')

Forecasting Pipeline as a Service

Our forecasting pipeline is modular and built upon simple APIs:

tspreprocess

CI/CD tspreprocess Lambda CI/CD tspreprocess docker image

Time series usually contain missing values. This is the case for sales data where only the events that happened are recorded. In these cases it is convenient to balance the panel, i.e., to include the missing values to correctly determine the value of future sales.

The tspreprocess API allows you to do this quickly and easily. In addition, it allows one-hot encoding of static variables (specific to each time series, such as the product family in case of sales) automatically.

tsfeatures

CI/CD tsfeatures Lambda CI/CD tsfeatures docker image

It is usually good practice to create features of the target variable so that they can be consumed by machine learning models. This API allows users to create features at the time series level (or static features) and also at the temporal level.

The tsfeatures API is based on the tsfeatures library also developed by the Nixtla team (inspired by the R package tsfeatures) and the tsfresh library.

With this API the user can also generate holiday variables. Just enter the country of the special dates or a file with the specific dates and the API will return dummy variables of those dates for each observation in the dataset.

tsforecast

CI/CD tsforecast Lambda CI/CD tsforecast docker image

The tsforecast API is responsible for generating the time series forecasts. It receives as input the target data and can also receive static variables and time variables. At the moment, the API uses the mlforecast library developed by the Nixtla team using LightGBM as a model.

In future iterations, the user will be able to choose different Deep Learning models based on the nixtlats library developed by the Nixtla team.

tsbenchmarks

CI/CD tsbenchmarks Lambda CI/CD tsbenchmarks docker image

The tsbenchmarks API is designed to easily compare the performance of models based on time series competition datasets. In particular, the API offers the possibility to evaluate forecasts of any frequency of the M4 competition and also of the M5 competition.

These APIs, written in Python and can be consumed through an SDK also written in Python. The following diagram summarizes the structure of our pipeline:

Build your own time-series processing service using AWS

Why ?

We want to contribute to open source and help data scientists and developers to achieve great forecasting results without the need to implement complex pipelines.

How?

If you want to use our hosted version send us an email or open a github issue and ask for API Keys.

If you want to deploy Nixtla on your own AWS Cloud you will need:

  • API Gateway (to handle API calls).
  • Lambda (or some computational unit).
  • SageMaker (or some bigger computational unit).
  • ECR (to store Docker images).
  • S3 (for inputs and outputs).

You will end with an architecture that looks like the following diagram

Each call to the API executes a particular Lambda function depending on the endpoint. That particular lambda function instantiates a SageMaker job using a predefined type of instance. Finally, SageMaker reads the input data from S3 and writes the processed data to S3, using a predefined Docker image stored in ECR.

Run the API locally

  1. Create the environment using make init.
  2. Launch the app using make app.

Create AWS resources

Create S3 buckets

For each service:

  1. Create an S3 bucket. The code of each lambda function will be uploaded here.

Create ECR repositorires

For each service:

  1. Create a private repository for each service.

Lambda Function

For each service:

  1. Create a lambda function with Python 3.7 runtime.
  2. Modify the runtime setting and enter main.handler in the handler.
  3. Go to the configuration:
    • Edit the general configuration and add a timeout of 9:59.
    • Add an existing role capable of reading/writing from/to S3 and running Sagemaker services.
  4. Add the following environment variables:
    • PROCESSING_REPOSITORY_URI: ECR URI of the docker image corresponding to the service.
    • ROLE: A role capable of reading/writing from/to S3 and also running Sagemaker services.
    • INSTANCE_COUNT
    • INSTANCE_TYPE

API Gateway

  1. Create a public REST API (Regional).
  2. For each endpoint in api/main.py… add a resource.
  3. For each created method add an ANY method:
    • Select lambda function.
    • Select Use Lambda Proxy Integration.
    • Introduce the name of the lambda function linked to that resource.
    • Once the method is created select Method Request and set API key required to true.
  4. Deploy the API.

Usage plan

  1. Create a usage plan based on your needs.
  2. Add your API stage.

API Keys

  1. Generate API keys as needed.

Deployment

GitHub secrets

  1. Set the following secrets in your repo:
    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • AWS_DEFAULT_REGION
Owner
Nixtla
Open Source Time Series Forecasting
Nixtla
AutoX是一个高效的自动化机器学习工具,它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

English | 简体中文 AutoX是什么? AutoX一个高效的自动化机器学习工具,它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色: AutoX在多个kaggle数据集上,效果显著优于其他解决方案(见效果对比)。 简单易用: AutoX的接口和sklearn类似,方便上手使用。

4Paradigm 431 Dec 28, 2022
A Streamlit demo to interactively visualize Uber pickups in New York City

Streamlit Demo: Uber Pickups in New York City A Streamlit demo written in pure Python to interactively visualize Uber pickups in New York City. View t

Streamlit 230 Dec 28, 2022
Bayesian Modeling and Computation in Python

Bayesian Modeling and Computation in Python Open access and Code This repository contains the open access version of the text and the code examples in

Bayesian Modeling and Computation in Python 339 Jan 02, 2023
Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas. Its objective is to ex

Taylor G Smith 54 Aug 20, 2022
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Oracle 95 Dec 28, 2022
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

Stox 31 Dec 16, 2022
PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

Lensa 1.1k Jan 04, 2023
A Python implementation of FastDTW

fastdtw Python implementation of FastDTW [1], which is an approximate Dynamic Time Warping (DTW) algorithm that provides optimal or near-optimal align

tanitter 651 Jan 04, 2023
scikit-learn is a python module for machine learning built on top of numpy / scipy

About scikit-learn is a python module for machine learning built on top of numpy / scipy. The purpose of the scikit-learn-tutorial subproject is to le

Gael Varoquaux 122 Dec 12, 2022
Solve automatic numerical differentiation problems in one or more variables.

numdifftools The numdifftools library is a suite of tools written in _Python to solve automatic numerical differentiation problems in one or more vari

Per A. Brodtkorb 181 Dec 16, 2022
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

sklearn-porter Transpile trained scikit-learn estimators to C, Java, JavaScript and others. It's recommended for limited embedded systems and critical

Darius Morawiec 1.2k Jan 05, 2023
pandas, scikit-learn, xgboost and seaborn integration

pandas, scikit-learn and xgboost integration.

299 Dec 30, 2022
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 03, 2023
Python package for machine learning for healthcare using a OMOP common data model

This library was developed in order to facilitate rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database.

Sontag Lab 75 Jan 03, 2023
a distributed deep learning platform

Apache SINGA Distributed deep learning system http://singa.apache.org Quick Start Installation Examples Issues JIRA tickets Code Analysis: Mailing Lis

The Apache Software Foundation 2.7k Jan 05, 2023
Projeto: Machine Learning: Linguagens de Programacao 2004-2001

Projeto: Machine Learning: Linguagens de Programacao 2004-2001 Projeto de Data Science e Machine Learning de análise de linguagens de programação de 2

Victor Hugo Negrisoli 0 Jun 29, 2021
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Thoughtworks 318 Jan 02, 2023
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 03, 2023
Apple-voice-recognition - Machine Learning

Apple-voice-recognition Machine Learning How does Siri work? Siri is based on large-scale Machine Learning systems that employ many aspects of data sc

Harshith VH 1 Oct 22, 2021
The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it inside a loop of Design, Model Development and Operations.

MLOps The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it insid

Maykon Schots 25 Nov 27, 2022