Safely roll out your machine learning models using Managed online endpoint in Azure Machine Learning

MLOps is a set of practices and tools that help organizations to manage and deploy models in a scalable and reliable way. They include cross-functional collaboration, and testing, and ensuring the deployment environment is secure and compliant with relevant regulations. By adopting MLOps practices, organizations can improve collaboration between teams, better govern and comply with regulations, and deploy models safely and securely.

In this blog, let's explore how Azure can help adopt MLOps practices, with a special focus on model deployment and safe rollout aspect. At Microsoft Build 2023, we announced the General Availability for Mirrored traffic as well – we will see how this helps complete the story for safe rollout of models.

What's covered:

  • Azure Machine Learning for model deployment
    • Model versioning and management
    • Deployment techniques for managing reliability
    • Model monitoring and logging
    • isolation
  • Safe rollout “in action” with Azure Machine Learning
    • Deploying a new model to production workspace
    • Choose existing endpoint with the old model
    • Enable telemetry logging and data exfiltration prevention
    • Setting up the Mirrored traffic for the new model
    • Finish the deployment and monitor the new model
    • Analyze costs and make the final call before initiating the rollout
    • Safely roll out the new model
  • Conclusion

Azure Machine Learning (AzureML) is a cloud-based platform that provides a comprehensive set of tools and services for building, training, and deploying machine learning models at scale. With Azure Machine Learning, data scientists can work in a collaborative and flexible environment that supports a wide range of open-source frameworks and languages.

To achieve successful and reliable MLOps practices, it's important to understand key concepts around model deployment and management. We also present how you can implement these concepts using Azure Machine Learning platform and its features such as Managed online endpoint.

 

1. Model versioning and management

As machine learning models evolve over time, it's important to keep track of different versions of the model and to manage those versions in a reliable and efficient way. Model versioning and management can help ensure that the correct version of the model is deployed and can be used for auditing and compliance purposes.

Azure Machine Learning workspaces allow model registration, which enables you to store and version your machine learning models in Azure. The model registry makes it easy to organize and keep track of trained models. Registered models are identified by name and version, allowing you to track the changes made to the model over time. Additionally, more metadata tags can be provided during registration, which can be helpful when searching for a specific model. Along with the models, you can also manage environment related metadata (such as pip and conda dependencies) in the Azure Machine Learning workspaces that can be associated with both training and deployment of the models. See Work with models in Azure Machine Learning for more.

Azure Machine Learning Registries takes it one step further. Azure Machine Learning Registries make the model artifacts and dependencies available to all workspaces in an organization, and they enable versioning, artifact management, and deployment management for machine learning models. One of the supported scenarios would be to have separate workspaces for development and production. You can iteratively develop a model in a development workspace. Once a good candidate model has been identified, it can be published to a registry. From the registry, the model can be deployed to endpoints in different production workspaces. See Create a model in registry and Announcing the general availability of Azure Machine Learning registries for more.

 

2. Deployment techniques for managing reliability

After you train a machine learning model or a machine learning pipeline, you need to deploy them so others can consume their predictions. Such an execution mode is called inference. In general, techniques such as A/B testing, canary releases, and feature flags can help to manage model deployment in a reliable and controlled manner. You can implement this practice using Azure Machine Learning Managed online endpoint feature. Let's quickly touch on the concepts of endpoint and deployment for machine learning inference first.

Azure Machine Learning defines the concept of “endpoints” that defines the “interface” for the inference. For instance, you can make an HTTP request to a URL using some sort of credentials, provide a picture of a car, and you can get the type and color of the car back as string values. This is what an “endpoint” would do as an interface. On the contrary, Alice, a data scientist, can implement and develop a model using ResNet architecture with TensorFlow framework and decide to use a CPU machine. This is considered a “deployment”, and you can assign this deployment to the endpoint mentioned earlier. Then a scoring request will go through the endpoint to this deployment, to provide a prediction.

Bob, another data scientist, may decide to use Torch framework with some data augmentation techniques and run on a GPU machine. This can be a new “deployment”, and you can assign this deployment to the same endpoint mentioned earlier (sharing the same interface concept).

SeokJin_Han_0-1684281255582.png

Using the monitoring and logging feature described earlier, you can compare deployments and see if the new model deployment performs better than the old one.

Azure Machine Learning provides a mechanism to control how the scoring requests are routed to each of the deployments behind an endpoint. In a blue-green deployment scenario, basic traffic split can be configured at endpoint level, so that, for example, 90% of the whole traffic goes to the blue (old model) while 10% goes to the green (new model).

SeokJin_Han_1-1684281255584.png

We're excited to share that Mirrored traffic is now Generally Available! Mirrored traffic allows mirroring a portion of the live traffic to a new model. In this case, 100% of the whole traffic can go to the blue (old model) so it ensures all predictions are only coming from a previously approved model, but 10% of the actual traffic from production can go to the green (new model) so that you can monitor its performance against production data.

That way, you can use all monitoring and logging features to measure how the new model performs in real environment, and control how it begins to handle production traffic in the most critical machine learning applications.

In general, testing the new deployment with traffic mirroring/shadowing is also known as shadow testing. The deployment receiving the mirrored traffic can also be called the shadow deployment.

SeokJin_Han_2-1684281255585.png

SeokJin_Han_3-1684281255604.png

In short, the concepts of endpoint and deployment, traffic control mechanisms including mirrored traffic, and the monitoring features allow simplifying safe rollout of new models and improving reliability in production scenarios.

 

3. Model monitoring and logging

When machine learning models are deployed to production, it's important to monitor them closely to detect any potential issues. Model monitoring and logging can help identify anomalous behavior or unexpected results, which can be a sign of degraded performance or other issues that need to be addressed.

Azure Machine Learning provides several ways to track and monitor metrics and logs regarding Azure Machine Learning online endpoints. Integrated with , you can view metrics in chart, compare between endpoints and deployments, pin to Azure portal dashboards, configure alerts, query from log tables, and push the logs to supported targets. You can also use Application Insights to analyze events from user containers.

 

Metrics

Endpoint level metrics such as request latency, requests per minute, new connections per second, bytes, etc. can be drilled down to deployment or status level. Deployment level metrics such as CPU/GPU utilization, memory or disk utilization can be drilled down to instance level. allows tracking these Metrics in charts and setting up dashboards and alerts for further analysis.

SeokJin_Han_4-1684281255614.png

 

Logs

You can send Metrics to the Log Analytics Workspace where you can query the logs using rich Kusto query syntax. You can also send Metrics to Account and/or Event Hubs for further processing. In addition, you can use dedicated Log tables for online endpoint related events, traffic, and console (container) logs. Kusto query allows complex analysis joining multiple tables.

SeokJin_Han_5-1684281255626.png

Application insights

Curated environments include the integration with Application Insights, and you can simply enable/disable it when you create an online deployment. Built-in metrics and logs are sent to Application insights, and you can use its built-in features such as Live metrics, Transaction search, Failures and Performance for further analysis.

In addition, you can perform actual cost breakdown analysis between endpoints and deployments. For example, after you deployed a new model to an endpoint, you can compare costs associated with both old model and new model and confirm cost implication of the changes you brought with the new model.

4. Network isolation

Network isolation can be crucial for ensuring the privacy, security, and compliance of your machine learning models. Private endpoints, which provide a secure way to access resources within a virtual network, can be used to protect your data and services from unauthorized access.

This involves both inbound and outbound security threats. The inbound threat is about unauthorized access to the endpoints for your machine learning models. You have authorization and mechanisms, but you may want to secure network access to your endpoints as well. The outbound threat is about data exfiltration from your own model deployment. You may want to block outbound access so that model deployments are only allowed to access resources secured within the virtual networks without external access.

Both inbound and outbound network access controls are easily configurable with Azure Machine Learning Managed online endpoint. When you deploy your model, you can simply indicate that you want to secure ingress for the model. In the backend, all the complex configuration is automatically set up so that the model serving endpoint (scoring URI) is only accessible from a private IP from your virtual network using workspace Private Endpoint (PE). Similarly, when you deploy your model, you can simply indicate that you want to secure egress for the model only to workspace resources. In this case, all the complex configuration is automatically set up so that the egress from the scoring model container will be restricted only to specific resources via secure connectivity through PEs, and the internet access is disabled.

SeokJin_Han_6-1684281255651.png

Here we illustrate actual steps that you can follow using Azure Machine Learning managed online endpoint. Let's say you are responsible for deploying recently developed credit card fraud detection models to production. You would have trained the models in a development Azure Machine Learning workspace. After validating a new model that is recently developed, the model can be promoted and registered to Azure Machine Learning Registries. Your task is to deploy the new model to a production Azure Machine Learning workspace leveraging safe rollout strategy.

 

1. Deploying a new model to production workspace

The production Azure Machine Learning workspace would have been configured with Private Link, and ready to serve models in virtual network. You can go to the Azure Machine Learning Registries you have access to, find the new model, click deploy – real-time endpoint, and choose the production workspace as the target workspace.

SeokJin_Han_7-1684281255670.png

2. Choose existing endpoint with the old model

In general, you could use the quick deployment wizard which will allow you to deploy the model with just one click. But in this scenario, you want to use advanced options such as mirrored traffic, you can click “more options”.

One of the configurations you can set for your endpoint is Public network access. This is a feature for network isolation. If you disable Public network access, it will block internet inbound. In this scenario, you would choose an existing endpoint, that is already running an old model and blocking internet inbound.

SeokJin_Han_8-1684281255691.png

3. Enable telemetry logging and data exfiltration prevention

When you continue with the wizard, you will arrive at the Deployment step. Here you can set these options:

  • Application Insights diagnostics: This will send application related telemetries to the Azure Application Insights that is mapped to the Azure Machine Learning workspace. You'll have access to common telemetry analysis functions such as live metrics, transaction search etc.
  • Egress public network access: This will block internet outbound from the deployment, and only allow outbound to secured resources within the virtual network.​ It deals with creating managed virtual network configurations including creation of private links and setting up network security groups, but for you it's as simple as toggling this option.

SeokJin_Han_9-1684281255703.png

4. Setting up the Mirrored traffic for the new model

Now the fun part! You can enable mirrored traffic for your new model, by enabling the feature and assigning 20% of the traffic. What this means is that out of 100% live traffic that the old model is taking, 20% of traffic is copied, or mirrored, and sent to the new model. All predictions that the client application receives are from the old model, but you can use built-in monitoring and logging features to debug the new model with the real-world data, without risking customer impact while testing the new model.

SeokJin_Han_10-1684281255716.png

5. Finish the deployment and monitor the new model

Now that both old and new models are running (although production applications are using the predictions from the old model), you can look at different metrics from both deployments. For example, you can look at latency or throughput or CPU/GPU utilization metrics and verify if new model performs as expected. You can also leverage dashboard or application insights to drill deeper. Another option you can consume the telemetry is to use Kusto queries on the log tables to analyze the data in more detail.

SeokJin_Han_11-1684281255726.png

SeokJin_Han_12-1684281255754.png

These metrics are useful for evaluating the operational performance of endpoints and deployments. If you are interested in monitoring the model quality performance, such as data drift, prediction drift, and data quality, you can read more here: Continuously Monitor the Performance of your AzureML Models in Production.

6. Analyze costs and make the final call before initiating the rollout

If the performance of your new model is within your target threshold (for example, it's using not too much compute resources like CPU, memory etc, or it's showing desirable latency or throughput), you can go ahead and analyze the cost of serving the new model. Again, you can check cost distributed per service, and break your cost down to the level of deployment. That way, you can ensure your new model is operating within budget.

SeokJin_Han_13-1684281255760.png

7. Safely roll out the new model

If the new deployment looks good in every aspect, we can now remove mirrored traffic and start sending live traffic gradually to the new deployment.​ For example, you can split the traffic and send only 10% of the traffic to the new model, while 90% is handled by the old model. With some cool down and approval policy implemented, you can integrate this safe rollout with your release pipeline and gradually increase the traffic ratio for the new model. Once the new model starts taking 100% live traffic, you can decide when to remove the old deployment. This way, you can safely roll out your new models, ensuring they meet both business and technical needs.

SeokJin_Han_0-1684281995469.png

We have explored how you can approach safe rollout problem in a production setup with Azure Machine Learning. Deployment of your machine learning models is becoming easier with Azure Machine Learning Managed online endpoint. Network isolation helps securing access to models and preventing data exfiltration from your models. Mirrored traffic adds a preventive layer to reduce the risk while testing the new models with real-world data.

Get started today with Azure Machine Learning Managed online endpoint!

To learn more about Azure Machine Learning Managed online endpoint, watch these Microsoft Build 2023 breakout sessions: 

 

This article was originally published by Microsoft's AI - Machine Learning Blog. You can find the original article here.