Secure Model Deployments with Microsoft Entra and Managed Online Endpoints

Microsoft token-based auth mode for managed online endpoints in Azure is now generally available. This new auth mode makes identity and access management easier when using models hosted on Azure.

Plus, to deploy models securely and efficiently, Azure offers another great feature: managed online endpoints. In this blog, we'll see how Microsoft Entra ID helps with endpoint /authorization, and learn some important concepts about Azure managed online endpoints and how to use them with Microsoft Entra ID.

Managed Online Endpoints: A Turnkey Solution to simplify deployments

Managed online endpoints are designed to simplify machine learning model deployment. Here are some benefits:

  1. Flexibility: Deploy models with no-code, low-code, or full code options
  2. Scalability: Handle high request volumes and expand to hundreds of nodes
  3. Cost Optimization: Autoscaling and cost monitoring for individual models
  4. Efficiency: GitOps-friendly interfaces, local testing, and safe model rollouts
  5. Security Compliance: Secure , isolation, and managed identity
  6. Infrastructure Management : Reduce infrastructure complexity

Both Azure AI Studio and Azure Machine Learning use managed online endpoints to improve model deployment experiences.

Key concepts: Endpoints and deployments

Managed online endpoints decouple the concepts of endpoints and deployments.

  1. Endpoint: A logical entity that represents a service or API where you can send requests for inferencing. It acts as the entry point for your machine learning models.
  2. Deployment: A set of resources and logic required for hosting the actual model that performs inferencing. It includes the necessary infrastructure, such as CPU or GPU machines, the environment, and the inferencing logic to consume the models.

When you deploy a model, you create a deployment behind an endpoint. This deployment contains the model, any custom code, and other dependencies.

A key benefit of this separation is that it decouples the interface presented to clients (the endpoint) from the implementation details (the deployment). This separation allows you to manage deployments independently without affecting the overall endpoint. For example, you can use this to maintain multiple deployments for multiple model versions under a single endpoint, or to safely roll out new model versions without downtime.

Key concepts: Control plane operation vs data plane operation

Control plane operations

These operations manage and modify the online endpoints. They involve creating, reading, updating, and deleting online endpoints and deployments. They send requests to the Azure Machine Learning workspace.

To authenticate, you need a Microsoft Entra token. To authorize, you need Azure RBAC action for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/write, delete, and read. We'll see what this means later.

Data plane operations

These operations do not change the online endpoints, but use data to work with them. For example, sending a scoring request to an online endpoint and getting a response. They send requests to the endpoint's scoring URI.

To authenticate, you can use a key, Microsoft Entra token, or Azure Machine Learning token. You specify which mechanism to use by choosing `auth_mode` when creating a managed online endpoint. If you choose Microsoft Entra to authorize, you need Azure RBAC action for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/score/action.

What's new about this? Data plane operations for online endpoint now supports Microsoft Entra token for full RBAC authorization.

Why Microsoft Entra ID is useful


Consume multiple endpoints using a single token with full RBAC support

Key auth endpoints have two keys for each endpoint, which may complicate integration with applications when you have many endpoints. Microsoft Entra ID lets you use one token for many endpoints, by giving the right role to a bigger scope you pick. For example, if the IT admin gives a group of Azure identities the right role for data plane actions over a resource group, those Azure identities can call all the endpoints in the resource group.

Streamline control plane and data plane operations

Expanding from the above scenario, if the group of Azure identities are assigned the proper role for both control plane and data plane operations over the scope of a subscription, the Azure identities will be able to not only create/update/delete/read the endpoints and deployments and but also invoke the endpoints in the subscription. You can adjust the RBAC actions and the scope as needed.


Seeing it in action

You don't need tokens if you sign in and use CLI/SDK/UI. They manage them for you. But you can also choose to do it yourself, if you use REST API with endpoints. We use CLI to keep it simple in this blog. We assume you have set up:

  • Azure AI hub / project or Azure Machine Learning workspace
  • Development environment where you have set up CLI and ml extension, or a cloud-based setting that is already prepared for you, such as Compute Instance, Code for Web, or Azure Cloud Shell

1. Create the endpoint and the deployment

Suppose you want to deploy an LLM model for a chat completion scenario. You'll want to create an endpoint and a deployment. Earlier, we saw that this is a control plane operation. First things first, you would sign into Azure.

az login

Once signed in, the is done and the CLI session will run on your Azure identity. When you perform an operation, the CLI will interact with the backend services (via REST APIs) and backend services will check authorization if the operation is permitted for your Azure identity.

Note that the IT admin with the permission of `Microsoft.Authorization/roleAssignments/write` will be able to assign the proper role for an operation to your Azure identity, if it isn't already assigned such a role.

Now suppose your definitions of the endpoint and deployment for your LLM model are defined as YAML files. Especially your endpoint definition YAML will look like:

name: my-endp1
auth_mode: aad_token

You can run the following commands to create the endpoint and the deployment:

az ml online-endpoint create -n my-endp1 -f ./endpoint.yaml
az ml online-deployment create -n blue -e my-endp1 -f ./deployment.yaml

The above two CLI commands send the requests to the Azure AI Studio hub or Azure Machine Learning workspace, and if your Azure identity has the permission to create them (more specifically, if your Azure identity is assigned an Azure role with ` Microsoft.MachineLearningServices/workspaces/onlineEndpoints/write` action over the scope of the Azure AI hub or Azure Machine Learning workspace resource), the creation requests will be processed.


If you're using REST API to create endpoints and deployments, you'll need to get the Microsoft Entra token from the resource endpoint and pass it in the header. More on this at Get the Microsoft Entra token for control plane operations.

2. Invoke the endpoint

Now it's time to consume the model. As mentioned earlier, this is a data plane operation.

You can formulate a request file that has the expected payload that the model expects, for example, a JSON input including input text, chat history, and/or parameters such as max response, temperature etc. Then you can run the following command to invoke the endpoint:

az ml online-endpoint invoke -n my-endp1 -r ./request.json

The CLI command will send the request to the scoring URI of the endpoint, and if your Azure identity has the permission to invoke the endpoint (more specifically, if your Azure identity is assigned an Azure role with ` Microsoft.MachineLearningServices/workspaces/onlineEndpoints/score/action` action over the scope of the managed online endpoint resource), the invoke request will be processed.


If you're using REST API to invoke endpoints, you'll need to get Microsoft Entra token from resource endpoint and pass it in the header. More on this at Get the key or token for data plane operations.


Azure Machine Learning managed online endpoints, combined with Microsoft Entra ID, provide a seamless and secure way to deploy and consume your AI/ML models. By leveraging these features, you can focus on delivering value to your organization without worrying about infrastructure complexities.

Learn more


This article was originally published by Microsoft's AI - Machine Learning Blog. You can find the original article here.