Azure MLOps (v2) solution accelerators.

Last update: Jan 01, 2023

Related tags

Overview

Azure MLOps (v2) solution accelerator

Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting point for MLOps implementation in Azure.

MLOps is a set of repeatable, automated, and collaborative workflows with best practices that empower teams of ML professionals to quickly and easily get their machine learning models deployed into production. You can learn more about MLOps here:

Prerequisites

An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.

Project overview

The solution accelerator provides a modular end-to-end approach for MLOps in Azure based on pattern architectures. As each organization is unique, solutions will often need to be customized to fit the organization's needs.

The solution accelerator goals are:

Simplicity
Modularity
Repeatability
Collaboration
Enterprise readiness

It accomplishes these goals with a template-based approach for end-to-end data science, driving operational efficiency at each stage. You should be able to get up and running with the solution accelerator in a few hours.

👤 Getting started: Azure Machine Learning - classical machine learning demo

The demo follows the classical machine learning pattern with Azure Machine Learning.

‼️ Please follow the instructions to execute the demo accordingly: Quickstart ‼️

‼️ Please submit any issues here: Issues ‼️

📐 Pattern Architectures: Key concepts

Link	AI Pattern
Pattern AzureML CML	Azure Machine Learning - Classical Machine Learning
Pattern AzureML CV	Azure Machine Learning - Computer Vision
[TBD]	Azure Machine Learning - Natural Language Processing
[TBD]	Azure Machine Learning / Azure Databricks - Classical Machine Learning
[TBD]	Azure Machine Learning / Azure Databricks - Computer Vision
[TBD]	Azure Machine Learning / Azure Databricks - Natural Language Processing
[TBD]	Azure Machine Learning - Edge AI

📯 (Coming Soon) One-click deployments

📯 MLOps infrastructure deployment

Name	Description	Try it out
Outer Loop	Default Azure Machine Learning outer infrastructure setup	[DEPLOY BUTTON]
[TBD]	Default Responsible AI for Classical Machine Learning	[DEPLOY BUTTON]
Feature Store FEAST	Default Feature Store using FEAST	[DEPLOY BUTTON]

📯 MLOps use case deployment

Name	AI Workload Type	Services	Try it out
classical-ml	Classical machine learning	Azure Machine Learning	[DEPLOY BUTTON]
[TBD]	Computer Vision	Azure Machine Learning	[DEPLOY BUTTON]
[TBD]	Natural Language Processing	Azure Machine Learning	[DEPLOY BUTTON]
[TBD]	Classical machine learning	Azure Machine Learning, Azure Databricks	[DEPLOY BUTTON]
[TBD]	Computer Vision	Azure Machine Learning, Azure Databricks	[DEPLOY BUTTON]
[TBD]	Natural Language Processing	Azure Machine Learning, Azure Databricks	[DEPLOY BUTTON]
[TBD]	Edge AI	Azure Machine Learning	[DEPLOY BUTTON]

Contributing

This project welcomes contributions and suggestions. To learn more visit the contributing section, see CONTRIBUTING.md for details.

Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Comments

Add azure devops repo init
This PR add a .azuredevops directory that accelerates an MLOps project template implementation in Azure DevOps. More particularly, it:

Allows users to run a pipeline that creates a project structure according to their selection (equivalent to the 'sparse checkout') in an empty repo.

Creates new Azure DevOps pipelines for infrastructure and MLOps based on the Azure Pipelines yaml files in infrastructure/pipelines and mlops/devops-pipelines respectively.

💎 enhancement
opened by drosevear 9
Job naming issues with AML CLI training pipeline

Why?

Currently when we deploy training pipeline using AML CLI we run into the problem that job names could not be validated.

How?

As we can see in the screenshot above that hyphens ain't allowed in the naming conventions for the jobs. So we should consider repacing hyphens with underscores in pipeline.yml file.

I could also create a PR if you consider this as a valid way of solving this issue.

Anything else?

opened by luhgit 9
Quickstart.md - Job: Create Bicep Deployment issue with the storage account name

Why?

While running pipeline with the Create Bicep Deployment Job we will get the error: {'code': 'StorageAccountAlreadyTaken', 'target': 'stmlopsv2819prod', 'message': 'The storage account named stmlopsv2819prod is already taken.'}

I believe storage account names are unique and since this was already executed I can't create a storage account with the same name.

How?

I believe in the both files config-infra-dev.yml and config-infra-prod.yml this line needs to change: st$(namespace)$(postfix)$(environment)

and we might have to add some initials of our company, subscription or even from our name

Thank you, Carla
🏗️ infra

opened by carla-fiadeiro 8

Authorisation problem when deploying training pipeline

Hello,

I have been following your quick start guide, and have got to the stage where I need to deploy the pipeline "deploy-model-training-pipeline.yml" on Azure DevOps.

When I run this, it goes as far as the Run pipeline in AML step in DevOps, then I get this error:

If there is an Authorization error, check your Azure KeyVault secret named kvmonitoringspkey. Terraform might put single quotation marks around the secret. Remove the single quotes and the secret should work.
.create table mlmonitoring (['Sno']: int, ['Age']: int, ['Sex']: string, ['Job']: int, ['Housing']: string, ['Saving accounts']: string, ['Checking account']: string, ['Credit amount']: int, ['Duration']: int, ['Purpose']: string, ['Risk']: string, ['timestamp']: datetime)
Cleaning up all outstanding Run operations, waiting 300.0 seconds
2 items cleaning up...
Cleanup took 0.08368873596191406 seconds
Traceback (most recent call last):
  File "/azureml-envs/XXXXXX/lib/python3.7/site-packages/azure/kusto/data/security.py", line 68, in acquire_authorization_header
    return _get_header_from_dict(self.token_provider.get_token())
  File "/azureml-envs/XXXXXX/lib/python3.7/site-packages/azure/kusto/data/_token_providers.py", line 123, in get_token
    token = self._get_token_impl()
  File "/azureml-envs/XXXXXX/lib/python3.7/site-packages/azure/kusto/data/_token_providers.py", line 554, in _get_token_impl
    return self._valid_token_or_throw(token)
  File "/azureml-envs/XXXXXXX/lib/python3.7/site-packages/azure/kusto/data/_token_providers.py", line 201, in _valid_token_or_throw
    raise KustoClientError(message)
azure.kusto.data.exceptions.KustoClientError: ApplicationKeyTokenProvider - failed to obtain a token. 
invalid_client
AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'XXXXX'.
Trace ID: XXXXXX
Correlation ID: XXXXXXX
Timestamp: 2022-11-01 15:46:31Z

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "prep.py", line 83, in <module>
    main()
  File "prep.py", line 80, in main
    log_training_data(df, args.table_name)
  File "prep.py", line 35, in log_training_data
    collector.batch_collect(df)
  File "/azureml-envs/XXXXX/lib/python3.7/site-packages/obs/collector.py", line 158, in batch_collect
    self.create_table_and_mapping()
  File "/azureml-envs/XXXXX/lib/python3.7/site-packages/obs/collector.py", line 132, in create_table_and_mapping
    self.kusto_client.execute_mgmt(self.database_name, CREATE_TABLE_COMMAND)
  File "/azureml-envs/XXXXX/lib/python3.7/site-packages/azure/kusto/data/client.py", line 891, in execute_mgmt
    return self._execute(self._mgmt_endpoint, database, query, None, self._mgmt_default_timeout, properties)
  File "/azureml-envs/XXXXX/lib/python3.7/site-packages/azure/kusto/data/client.py", line 959, in _execute
    request_headers["Authorization"] = self._aad_helper.acquire_authorization_header()
  File "/azureml-envs/XXXXX/lib/python3.7/site-packages/azure/kusto/data/security.py", line 72, in acquire_authorization_header
    raise KustoAuthenticationError(self.token_provider.name(), error, **kwargs)
azure.kusto.data.exceptions.KustoAuthenticationError: KustoAuthenticationError('ApplicationKeyTokenProvider', 'KustoClientError("ApplicationKeyTokenProvider - failed to obtain a token. \ninvalid_client\nAADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app 'XXXXXXX'.\r\nTrace ID: XXXXXXXX\r\nTimestamp: 2022-11-01 15:46:31Z")', '{'authority': 'XXXXXX', 'client_id': 'XXXXX', 'kusto_uri': 'https://adxmlopsv286309prod.uksouth.kusto.windows.net'}')

I have done some investigating:

It appears my Prod Service Principal is set up correctly. When I go to Project Settings > Service Connections > Azure-ARM-Prod > Edit, there is an option to verify the connection. It works here.
When I check my App Registrations and investigate the "Certificates and Secrets tab" of "Azure-ARM-Prod-mlops-sparse" there is indeed a secret in here. (The terraform pipeline to create the Prod infrastructure works, so I am lead to believe that my SPs are working correctly.)
When I go to my Prod Key Vault, there is a secret called kvmonitoringspkey - looking at this though the Secret Value is just $(CLIENT_SECRET) - is it meant to be this? If so, why? And where was is set to this?

Do you have any advice on how I can fix this error?

Bug 📑 documentation

opened by andrewblance 6

Enterprise readiness where Github is not the source control repository

The current Quickstart doesn't cater for organisations that don't use GitHub as their source control repository. There's a need for these organisations to get started on Azure quickly – for MLOps projects to be initialised in Azure DevOps itself based on 'MLOps version' and 'project type'.
💎 enhancement

opened by drosevear 4
How to achieve Continuous Integration

Hi,

As per the process, we are training the and registering the model in DEV Workspace. But how to bring/ deploy that model in Stage.

It would be better if we have the steps between registering the model in dev workspace and deploy in Stage Workspace

opened by MurugeswariMuthurajan 4
[repo]

Why?

If you don't fix the version, if there's an upgrade from the library the pipeline may break. Currently the install CLI action does not fix the version which means that Devops Agent will pick up the latest Azure ML CLI version while the job/pipeline definition still follow old version. This may break the pipeline.

How?

Anything else?

opened by james-tn 4
Could not get the latest source version for repository

@setuc /mlops/devops-pipelines/deploy-model-training-pipeline.yml: Could not get the latest source version for repository Azure/mlops-templates hosted on https://github.com/ using ref refs/heads/main. GitHub reported the error, "Bad credentials"

Thank you ..
✅ resolved

opened by yogidosalwar 3
[mlops-v2] Is the Quickstart tutorial limited to MS/Azure employees ?
Why?

This repo (mlops-v2) was recommended to us (MS/Azure Client) by our MS business contact to try out Azure MLOps. However, it looks like some steps require specific organization access.

How?

Doing the Quickstart

Creating the PAT in the Developer Settings of my GH account works fine

However, step 2.7 (pictured bellow), isn't available to us for pretty obvious reasons (not part of Azure) :

is there a way to go around this ? A public version of the tutorial/setup ?

looking forward to try it out ! Thanks in advance
Bug 📑 documentation
opened by gaspard-met 3
Hosted Agent ran longer than the maximum time of 60 minutes

@setuc @cindyweng while running 3rd pipeline(Inner / Outer Loop: Moving to Production - Azure DevOps) Showing error The job running on agent Hosted Agent ran longer than the maximum time of 60 minutes while Test deployment. What should i do ? any suggestions ??

Thank you.
✅ resolved

opened by lokierao 2
[Quickstart.md] Deploying Infrastructure via Azure DevOps

Hi, I am facing an issue with the deployment via Azure Pipeline. When I run the pipeline, the following error is thrown.

It seemed like I had to change the resource repositories and connections. So, I tried changing it to the following and another error is thrown.

Can anyone help me with the issue?

Thanks, Saiham
✅ resolved

opened by saiham6 2
Main dec31
PR Summary into Azure/mlops-project-template

Checklist

I have:

[x] read and followed the contributing guidelines

[x] PR has a meaningful title

[x] Summarized the changes

[x] PR is ready to merge and NOT ** WORK IN PROGRESS **

Changes

fixes #
opened by setuc 0
When working with Conda, there are no checks on the environment packages
Describe the bug or the issue that you are facing

When creating the environments with conda, there are no checks performed when creating the environment to see if the packages exist or if the pip dependencies are also correctly installed.

Steps/Code to Reproduce

When creating the environments with conda, there are no checks performed when creating the environment to see if the packages exist or if the pip dependencies are also correctly installed.

az ml environment create --file .\train-conda.yml

train-conda.yml

$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json name: docker-image-plus-conda-example image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04 conda_file: environment.yml description: Environment created from a Docker image plus Conda environment.

environment.yml

channels: - defaults - anaconda - conda-forge dependencies: - python=3.7.5 - pip - pip: - azureml-mlflow==1.38.0 - azureml-sdk==1.38.0 - sklearn==0.24.1

Expected Output

No error is thrown even though there is no such package like sklearn.

There should be a check to make sure that the environment has been successfully created before moving on to the next steps.

Versions

running az -v azure-cli 2.43.0

core 2.43.0 telemetry 1.0.8

Extensions: account 0.2.5 ml 2.12.1

Dependencies: msal 1.20.0 azure-mgmt-resource 21.1.0b1

Which platform are you using for deploying your infrastrucutre?

Azure DevOps (ADO)

If you mentioned Others, please mention which platformm are you using?

No response

What are you using for deploying your infrastrucutre?

Bicep

Are you using Azure ML CLI v2 or Azure ML Python SDK v2

Azure ML CLI v2

Describe the example that you are trying to run?

Tabular. but this affects all the environments.
Bug Needs Triage
opened by setuc 1
error while running 3rd pipeline Inner / Outer Loop: Moving to Production - Azure DevOps

Hi @setuc @cindyweng, getting error while running deploy-batch-endpoint-pipeline.yml . while creating deployment

ERROR: (UserError) The specified resource was not found. Code: UserError Message: The specified resource was not found. Exception Details: (ModelNotFound) Model container with name: patient-model not found. Code: ModelNotFound Message: Model container with name: xyz-model not found.

Thank you
❓question

opened by yogidosalwar 3
"DeployTrainingPipeline" throws error at the "Connect to AML Workspace" step

To make the infrastructure, I ran an Azure Pipeline using infrastructure/pipelines/tf-ado-deploy-infra.yml, and it worked. Then, used mlops/devops-pipelines/deploy-model-training-pipeline.yml to run the (training) Pipeline on the sample data provided, but it stops at the "Connect to AML Workspace" step with the following error:

Then, checked the workspace that's created automatically by the Pipeline and noticed that I don't have access to the Jobs, while receiving a weird notification:

That is while I have a "Contributor" access to the subscription and also, all those secrets and App registrations, etc are made/done without any problem, based on the QuickStart tutorial.

opened by smortezah 1

Getting error while creating Responsible Ai dashboard

I am trying to use different algorithms for training. Model training and model registration part is successfully completed, but RAI insights dashboard constructor is failed. I am using the xgboost model for training.

Traceback (most recent call last):
  File "create_rai_insights.py", line 184, in <module>
    main(args)
  File "create_rai_insights.py", line 128, in main
    model_estimator = load_mlflow_model(my_run.experiment.workspace, model_id=model_id)
  File "/mnt/azureml/cr/j/exe/wd/rai_component_utilities.py", line 79, in load_mlflow_model
    return mlflow.pyfunc.load_model(model_uri)._model_impl
  File "/azureml-envs/responsibleai-0.18/lib/python3.8/site-packages/mlflow/pyfunc/__init__.py", line 484, in load_model
    model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
  File "/azureml-envs/responsibleai-0.18/lib/python3.8/site-packages/mlflow/sklearn/__init__.py", line 494, in _load_pyfunc
    return _load_model_from_local_file(path=path, serialization_format=serialization_format)
  File "/azureml-envs/responsibleai-0.18/lib/python3.8/site-packages/mlflow/sklearn/__init__.py", line 452, in _load_model_from_local_file
    return cloudpickle.load(f)
ModuleNotFoundError: No module named 'xgboost'

Thank You

opened by yogidosalwar 11

how can i create inference clusters ?

Hi @setuc , I want to create inference cluster, i.e. Deploy model to the Azure Kubernetes Service for large scale inferencing. How can I create a compute inference cluster?

Thank you

opened by yogidosalwar 1

Releases(v1.0.0)

v1.0.0(Sep 2, 2022)
The repository now contains the following patterns that have been implemented:

1. Classical / Tabular Machine Learning Model a. Supports Azure DevOps and GitHub Actions. Note that GitHub Actions are only working for Azure ML CLI v2 (aml-cli-v2). b. Supports Azure Data Explorer based Monitoring, Data Drift and Anomaly Detection. It is enabled for Terraform and can be invoked via python sdk v1 or Azure ML CLI v2 (aml-cli-v2). c. Can deploy both online as well as batch end points.

2. Computer Vision (CV) Model a. Supports Azure DevOps and GitHub Actions. Note that GitHub Actions are only working for Azure ML CLI v2 (aml-cli-v2). b. Currently there is no monitoring support for this. c. Can deploy both online as well as batch end points.

3. Natural Language Processing (NLP) Model a. Supports only support Azure DevOps via Azure ML CLI v2 (aml-cli-v2). b. Currently there is no monitoring support for this. c. Can deploy both online as well as batch end points.

Other Patterns included in the release

Templates for using individual / repeatable steps in your template. These templates are available for GitHUb Actions and Azure DevOps (CLI and SDK). The Template repo can be found here: https://github.com/Azure/mlops-templates

Registration of Multiple Datasets

Using 3rd party / external containers. Support for dependabot python package scans via pip install in docker container.

Full Changelog: https://github.com/Azure/mlops-project-template/commits/v1.0.0
Source code(tar.gz)
Source code(zip)

Owner

Microsoft Azure

APIs, SDKs and open source projects from Microsoft Azure

GitHub Repository

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

sklearn-porter Transpile trained scikit-learn estimators to C, Java, JavaScript and others. It's recommended for limited embedded systems and critical

1.2k Jan 05, 2023

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

25 Dec 28, 2022

Azure MLOps (v2) solution accelerators.

Related tags

Overview

Azure MLOps (v2) solution accelerator

Prerequisites

Project overview

👤 Getting started: Azure Machine Learning - classical machine learning demo

📐 Pattern Architectures: Key concepts

📯 (Coming Soon) One-click deployments

📯 MLOps infrastructure deployment

📯 MLOps use case deployment

Contributing

Trademarks

Comments

Why?

How?

Anything else?

Why?

How?

Why?

How?

Anything else?

Why?

How?

PR Summary into Azure/mlops-project-template

Checklist

Changes

Describe the bug or the issue that you are facing

Steps/Code to Reproduce

Expected Output

Versions

Which platform are you using for deploying your infrastrucutre?

If you mentioned Others, please mention which platformm are you using?

What are you using for deploying your infrastrucutre?

Are you using Azure ML CLI v2 or Azure ML Python SDK v2

Describe the example that you are trying to run?

Releases(v1.0.0)

v1.0.0(Sep 2, 2022)

Owner

Microsoft Azure

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Python Automated Machine Learning library for tabular data.

MiniTorch - a diy teaching library for machine learning engineers

MegFlow - Efficient ML solutions for long-tailed demands.

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language

Reproducibility and Replicability of Web Measurement Studies

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning

Sequence learning toolkit for Python

Drug prediction

Lightning ⚡️ fast forecasting with statistical and econometric models.

A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement.

Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Free MLOps course from DataTalks.Club

Lseng-iseng eksplor Machine Learning dengan menggunakan library Scikit-Learn

Tutorial for Decision Threshold In Machine Learning.

A Python package to preprocess time series

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning.

A library to generate synthetic time series data by easy-to-use factors and generator

Pyomo is an object-oriented algebraic modeling language in Python for structured optimization problems.