Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery

Related tags

Deep Learninglorien
Overview

Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery

Build Status codecov.io

Lorien is an infrastructure to massively explore/benchmark the best schedules of given deep learning models. Lorien is deep learning compiler (DLC) agnostic, so one can easily implement a Lorien dialect to support a new DLC.

Motivation

Although auto-tuning frameworks for deep learning compilers (e.g., TVM, Halide) are capable of delivering high-performance operators that match or even beat vendor kernel libraries, auto-tuning a deep learning model could take days or even weeks, especially for the model with many workloads like ResNet-152 or Inception V3.

With such a long tuning time, one key question To maintain the best user experience during deep model developments and deployments is How to promptly deliver schedules with reasonably good performance upon user requests? Accordingly, we design and implement Lorien to remove the following obstacles:

  1. Tuning Process Scalability and Stability. Long tuning time affects not only the time-to-market but the stability. To the best of our knowledge, none of existing auto-tuning frameworks is designed for tuning on multiple machines, and none of them consider fault tolerance. The tuning process, hence, has to be manually started over if it was accidentally interrupted. This is crucial especially on edge devices, which are less reliable than cloud instances and may fail frequently due to overheat or other factors.

  2. Tuning Result Management. Although almost all auto-tuning frameworks provide mechanisms to serialize tuning results for future applications, all of them use file-based mechanism and have different formats. As a result, engineers have additional work to orchestrate the data for efficient usage.

  3. Time to Deliver an Efficient Schedule. Even a database is constructed to serve most user requests, it is still possible that certain workloads are missing. However, modern auto-tuning frameworks usually leverage iterative search algorithms with on-device measurements, which usually take hours, to find an efficient schedule for an unseen workload. The unfavorably expensive querying/tuning overhead makes production deployment impractical.

Lorien is a unified and extensible infrastructure for delivering efficient deep learning workloads upon requests. Lorien allows auto-tuning deep learning frameworks to be easily plugged in as dialects, and supports large scale tuning on both cloud and edge platforms. The tuning results are managed in a NoSQL database with a unified data model that fits all auto-tuning frameworks. While the best schedules managed in the database can be used to compile deep learning models to achieve high performance, the tuning logs managed in a file system can also 1) enable more comprehensive performance analysis on different platforms, and 2) help train a performance cost model with an AutoML solution.

Please visit the official documentations for setup guideline and tutorials.

System Requirements

  • Python 3.6+

  • Amazon DynamoDB (local or aws): DynamoDB is used for storing and maintain the tuned schedules. You can choose to either of the following:

    1. Launch a local version using JVM on your machine, and specify endpoint URL (e.g. --db "endpoint_url: http://:8000") when invoking a tuning procses.

    2. Configure AWS credential on your machine to directly use AWS DynamoDB service. In this case, you do not have to specify any argument in tuning configurations.

  • AWS S3 (optional): S3 is used to store the full tuning logs (JSON files generated by AutoTVM). If you specify --commit-log-to bucket_name and configure an AWS credential on your machine, then all complete tuning logs will be uploaded to the S3 bucket for debugging or research prupose. Note that this is an optional requirement, so you can ignore the --commit-log-to argument if you do not want to keep full tuning logs.

  • AWS Batch (AWS ECR): You have to set up AWS batch computation environments, job queues, and job definitions in advance to use Lorien AWS batch worker for tuning. See this blog post for reference. You may also need to build an upload Lorien docker images to AWS ECR as the AWS batch job running container.

Docker Images

You can directly make use of pre-built Lorien docker images on Docker Hub, which includes two typs of images for CPU and CPU+CUDA platforms. The docker images have TVM deployed so you can launch a tuning process in the container after cloning Lorien. The docker image is also used for Lorien CI purpose.

Documentation

https://awslabs.github.io/lorien/

Citing Lorien

If you use Lorien in a scientific publication, please cite the following paper:

Cody Hao Yu, Xingjian Shi, Haichen Shen, Zhi Chen, Mu Li, Yida Wang, "Lorien: Efficient Deep Learning Workloads Delivery", Proceedings of the 12th ACM Symposium on Cloud Computing. 2021.

@inproceedings{yu2021lorien,
  title={Lorien: Efficient Deep Learning Workloads Delivery},
  author={Yu, Cody Hao and Shi, Xingjian and Shen, Haichen and Chen, Zhi and Li, Mu and Wang, Yida},
  booktitle={Proceedings of the Seventh ACM Symposium on Cloud Computing},
  year={2021}
}
Owner
Amazon Web Services - Labs
AWS Labs
Amazon Web Services - Labs
Space-invaders - Simple Game created using Python & PyGame, as my Beginner Python Project

Space Invaders This is a simple SPACE INVADER game create using PYGAME whihc hav

Gaurav Pandey 2 Jan 08, 2022
Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Neural Circuit Policies Enabling Auditable Autonomy Online access via SharedIt Neural Circuit Policies (NCPs) are designed sparse recurrent neural net

8 Jan 07, 2023
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 49 Nov 28, 2022
Reading Group @mila-iqia on Computational Optimal Transport for Machine Learning Applications

Computational Optimal Transport for Machine Learning Reading Group Over the last few years, optimal transport (OT) has quickly become a central topic

Ali Harakeh 11 Aug 26, 2022
10th place solution for Google Smartphone Decimeter Challenge at kaggle.

Under refactoring 10th place solution for Google Smartphone Decimeter Challenge at kaggle. Google Smartphone Decimeter Challenge Global Navigation Sat

12 Oct 25, 2022
[ICSE2020] MemLock: Memory Usage Guided Fuzzing

MemLock: Memory Usage Guided Fuzzing This repository provides the tool and the evaluation subjects for the paper "MemLock: Memory Usage Guided Fuzzing

Cheng Wen 54 Jan 07, 2023
DCGAN-tensorflow - A tensorflow implementation of Deep Convolutional Generative Adversarial Networks

DCGAN in Tensorflow Tensorflow implementation of Deep Convolutional Generative Adversarial Networks which is a stabilize Generative Adversarial Networ

Taehoon Kim 7.1k Dec 29, 2022
NasirKhusraw - The TSP solved using genetic algorithm and show TSP path overlaid on a map of the Iran provinces & their capitals.

Nasir Khusraw : Travelling Salesman Problem The TSP solved using genetic algorithm. This project show TSP path overlaid on a map of the Iran provinces

J Brave 2 Sep 01, 2022
ElasticFace: Elastic Margin Loss for Deep Face Recognition

This is the official repository of the paper: ElasticFace: Elastic Margin Loss for Deep Face Recognition Paper on arxiv: arxiv Model Log file Pretrain

Fadi Boutros 113 Dec 14, 2022
Official implementation of "Articulation Aware Canonical Surface Mapping"

Articulation-Aware Canonical Surface Mapping Nilesh Kulkarni, Abhinav Gupta, David F. Fouhey, Shubham Tulsiani Paper Project Page Requirements Python

Nilesh Kulkarni 56 Dec 16, 2022
[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022) This repository provides the official PyTorch impleme

Billy XU 128 Jan 03, 2023
Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal"

Patch-wise Adversarial Removal Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

4 Oct 12, 2022
Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix Tuning Files: . ├── gpt2 # Code for GPT2 style autoregressive LM │ ├── train_e2e.py # high-level script

530 Jan 04, 2023
A library for using chemistry in your applications

Chemistry in python Resources Used The following items are not made by me! Click the words to go to the original source Periodic Tab Json - Used in -

Tech Penguin 28 Dec 17, 2021
A small library for creating and manipulating custom JAX Pytree classes

Treeo A small library for creating and manipulating custom JAX Pytree classes Light-weight: has no dependencies other than jax. Compatible: Treeo Tree

Cristian Garcia 58 Nov 23, 2022
A curated list of awesome Active Learning

Awesome Active Learning 🤩 A curated list of awesome Active Learning ! 🤩 Background (image source: Settles, Burr) What is Active Learning? Active lea

BAI Fan 431 Jan 03, 2023
Riemannian Geometry for Molecular Surface Approximation (RGMolSA)

Riemannian Geometry for Molecular Surface Approximation (RGMolSA) Introduction Ligand-based virtual screening aims to reduce the cost and duration of

11 Nov 15, 2022
Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

TANG, shixiang 6 Nov 25, 2022
Pytorch implementation of the paper: "A Unified Framework for Separating Superimposed Images", in CVPR 2020.

Deep Adversarial Decomposition PDF | Supp | 1min-DemoVideo Pytorch implementation of the paper: "Deep Adversarial Decomposition: A Unified Framework f

Zhengxia Zou 72 Dec 18, 2022
[CVPR2021] DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

DoDNet This repo holds the pytorch implementation of DoDNet: DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datase

116 Dec 12, 2022