Bringing ML assets to the Microsoft Purview Data Map

We are delighted to announce the public preview of Azure Machine Learning assets in Microsoft Purview. This exciting development enables ML practitioners, /ML risk professionals, and data professionals working with Purview to benefit from discovery, lineage tracking, and responsible governance throughout the MLOps lifecycle.

In today's data-driven world, effective data security, governance, and privacy necessitate a comprehensive understanding of an organization's data and systems, including ML assets. However, most organizations lack this holistic view. Enter Microsoft Purview, a family of data governance, risk, and compliance solutions designed to help organizations discover, govern, protect, and manage their entire data estate, now including the estate. With integrated coverage, Microsoft Purview addresses the challenges posed by remote user connectivity, data fragmentation, and the evolving roles in traditional IT management.

Azure Machine Learning (AzureML) now introduces ML assets as a new object in Purview. This integration allows for associating ML models with the data used for training and facilitates emerging ML and AI risk and governance scenarios.

In many ways, ML models are representations of data. The MLOps lifecycle begins with data exploration to understand the business problem, identify potential features for machine learning, or obtain training data for fine-tuning AI. Responsible AI practices also hinge upon understanding the shape and balance of the data. Additionally, models may need to comply with privacy regulations, data sovereignty requirements, or other data usage mandates.

When you connect your Azure Machine Learning workspace to Microsoft Purview, metadata on AI assets, including models, datasets, and jobs, is automatically published to the Purview Data Map. This enables ML Pros and data engineers to observe how components are shared and reused across the enterprise. They can examine the lineage and transformations of training data and understand the impact of any issues in dependencies. For example, changes in dataset features may necessitate an update to the model.

Risk and compliance professionals, such as Model Risk Officers and Data Officers, can gain valuable insights into how data is used to train AI models, how base models are fine-tuned or extended, and where models are employed across different production applications. This information is crucial for supporting Responsible AI practices and providing evidence for compliance reports and audits.

At Microsoft, we will be relying on the integration of Microsoft Purview and Azure Machine Learning to support our Responsible AI and Secure Software Supply Chain initiatives. Purview acts as a single pane of glass for teams and leadership, providing visibility into all AI and ML models in production, their training process, and the source datasets. This reporting helps our product teams ensure that production AI models undergo Responsible AI impact assessments and reviews, and that they have a software bill of materials (SBOM) documenting the packages used during training. With this centralized information in Microsoft Purview, we establish a single source of truth for models used in production and their Responsible AI compliance status. Purview facilitates the creation of custom reports with and the development of workflows to manage compliance for different groups.

Collaboration is key in the field of data science, and Microsoft Purview makes it easier to discover data and ML assets within an organization, such as models, components, and datasets. This discovery process accelerates the initiation of new projects. With Microsoft Purview at the core, tracking how assets are used or customized becomes effortless. Policies in MLOps can be established to ensure that models are registered before deployment and that Responsible AI practices are upheld. A screenshot showcasing the lineage from source data to an AzureML dataset with data transformation before training a model is provided for reference.

Here's a screen shot showing the lineage from source data to AzureML dataset with data transform before training a model.

purviewpicture (Large).png

Models trained in AzureML can be automatically synced to Microsoft Purview when added to the workspace model registry. Additionally, registering models is supported through MLFlow.

When using an AzureML dataset, the dataset-to-job-to-model lineage is also stored. If the dataset's source is in the Purview catalog, you can trace the lineage from the data source to the dataset and onward to the model. This comprehensive lineage encompasses the data's ingestion, facilitated by , the training in AzureML, and the final results generated from model inference in a report. In short, you gain end-to-end visibility into both your data estate and AI estate.

Bringing data assets from Microsoft Purview into Azure Machine Learning benefits the ML lifecycle in various ways. ML Pros can more effectively find suitable data, ensuring conformity with privacy or data residence requirements, and they can stay informed about changes in source data that could impact feature availability or necessitate re-training.

To learn more about the integration of Azure ML with Microsoft Purview and set up the preview for your workspaces, please refer to our documentation. Additionally, we encourage you to join the Practical Deep Dive into Machine Learning Techniques and MLOps session at Microsoft Build to this integration in action.


This article was originally published by Microsoft's AI - Machine Learning Blog. You can find the original article here.