A Solution for ML Pipeline in Multi-tenancy Manner

Very often, Solution Providers have enterprise scenarios for deploying ML pipelines, where involves multiple tenants and each tenant may have their own Azure subscription.

There are some situations when designing the enterprise solution, for example:

  1. Each tenant may want to keep their own data non-shared.
  2. Each tenant has their own computing and hosting environment for ML training, retraining, and inferencing.
  3. Each tenant may have different needs for retraining, with different scheduling, using different data sets.
  4. Even for the same ML algorithm, each tenant may use different parameters.
  5. The Solution Provider wants to maintain a centralized repository for all tenants' data, models, environments, components, etc.
  6. In addition, the Solution Provider wants flexibility to manage all tenants, when and how they want the tenants to share data, models, environments, pipeline environments. Each tenant has their flexibility to share those with other tenants as well.

We provide this multi-tenancy solution for Solution Providers to deploy and manage the ML pipelines across multiple workspaces, where each workspace may belong to a different Azure subscription.

The key element here is Azure (AzureML) Registries. It acts as a middleman for tenants to share data/models/environments/components, with . When creating an AzureML registry, it is essential to make it available for multiple regions where the tenants reside in. The tenants outside of the Primary and Additional regions covered by the registry are not able to share data/models/environments/components with the registry. Besides that, the registry needs to add the workspaces as user; in workspaces, need to assign certain role (i.e. contributor) to the ML registry owner at subscription level and workspace level.

Helenzeng_0-1714164956693.png

The solution works in this way: Each tenant has their own workspace using their own subscription; within the workspace, the tenant is self-sufficient for computing and resources, it can build ML pipeline using its own data/models/environments/components. If the tenant wants to share data/models/environments/components with another tenant, it shares to the registry first, which we call ‘share' in the picture below; then through the registry to share with other tenants, which we call ‘push' in the picture below. Tenants can also get data/models/environments/components from registry, which we call ‘pull' as shown in the picture below.

  • Share – from tenant (workspace) sends to registry.
  • Push – from registry sends to tenant (workspace).
  • Pull – tenant (workspace) gets from registry.

Helenzeng_1-1714164956700.png

This kind of solution design can satisfy multiple scenarios of model sharing:

  1. Each tenant can have their specific models without sharing with others.
  2. All tenants can share their models if they want.
  3. All tenants can pull the shared models and retrain or fine-tune, then share back the retrained or fine-tuned models.

Similar scenarios apply to data, environments, and components.

Notes:

  1. If a tenant has multiple subscriptions, then the sharing is done at subscription level then workspace level.
  2. This solution doesn't apply if the sharing has to be done above subscription level. That means, if the ML registry can't access the tenant's subscription directly, then the ML registry can't share/push entities to that tenant.

Below is an example for model sharing in four workspaces, where three of them belong to one subscription and the other one belongs to a different subscription.

Helenzeng_2-1714164956706.png

In this example, workspace1 shares its model credit-card-default to registry, then the registry pushes to workspace3; workspace3 shares its model bert-case-uncased_fine-tuned to registry, then the registry pushes to workspace1, workspace2 and workspace4. The registry has both models.

For data, models, environments, components, the way of share/push/pull between tenant and registry are a little different. For ‘share/push', it can be done in AzureML UI and SDK; for ‘pull', it can be done in SDK.

Below is one example for ‘share' model from workspace1 to registry using AzureML UI.

Helenzeng_3-1714164956718.png

Below are some examples of SDK to get (‘pull') data/model/environment/component from registry:

Helenzeng_4-1714164956719.png

Helenzeng_5-1714164956720.png

Helenzeng_6-1714164956721.png

Helenzeng_7-1714164956722.png

Then further register into the workspace, below is an example for environment:

Helenzeng_8-1714164956725.png

The cool thing about this solution is that, once a tenant gets data/model/environment/component from registry, where originally shared by another tenant, it can build its own ML pipeline, run the pipeline using its own resources. Besides the way described above using SDK, the tenant can retrieve those from AzureML Designer. By filtering the registry creator, the workspace can see the data, model and component, and then create pipeline on the canvas.

Helenzeng_9-1714164956729.png

Below is one example in workspace4, it gets model from registry where originally from workspace3, it uses its own data set, AzureML pre-built component, and creates a fine-tuning pipeline. After it runs the pipeline, fine-tunes the model, it can share a new version of fine-tuned model back to registry. Remember, workspace4 belongs to a different subscription from workspace3, it's really cool!

Helenzeng_10-1714164956733.png

The way registry ‘share/push' model to a workspace is that it deploys a real-time or batch endpoint to the workspace. The workspace then can do inference using the endpoint. Below is an example, workspace4 gets the model from registry where originated from workspace3, then performs a test using input data.

Helenzeng_11-1714164956741.png

If the workspace deployed an endpoint from the shared model, also ‘get/pull' the model from the registry, it can add deployment to the endpoint, as shown below.

Helenzeng_12-1714164956745.png

References:

  1. Create and manage registries – Azure Machine Learning | Microsoft Learn
  2. Share data across workspaces with registries (preview) – Azure Machine Learning | Microsoft Learn
  3. Share models, components, and environments across workspaces with registries – Azure Machine Learnin…
  4. azure-docs/articles/machine-learning/how-to-share-models-pipelines-across-workspaces-with-registries…

Acknowledgement:

Thanks Daniel Scott-Raynsford and Facundo Santiago for encouraging me to write this article. We are glad to share this solution implementation broadly to help our customers.

Reviewers:

Daniel Scott-Raynsford, Takuto Higuchi, Alex Zeltov

 

This article was originally published by Microsoft's AI - Machine Learning Blog. You can find the original article here.