Federated Learning with Azure Machine Learning: Powering Privacy-Preserving Innovation in AI

Federated learning is an innovative approach to for compliance. It enables multiple organizations to come together and train better quality models, while helping them to achieve their respective data privacy and security standards. In a nutshell, federated learning consists in training a model partially within distinct trust boundaries (countries, institutions, companies, tenants) also called silos, then the partial models are aggregated centrally in an orchestrator. This process is repeated between silos and orchestrator until convergence and generalization is achieved.

Our solutions harness the potential of federated learning by combining the advanced capabilities of Azure for provisioning flexible infrastructures for the silos, and Azure for the orchestration of training at scale. This also integrates with important features such as Azure confidential computing and at rest, differential privacy, pushing higher the standards for confidential ML.

Direct links to action

To skip and jump to direct hands-on experience, check the links below:

Federated learning unblocks complex industry scenarios

The federated learning paradigm is flexible and can tackle different organizational scenarios where traditional ML would be blocked. Let's take two common use cases.

Use case 1 – One company, multiple trust boundaries – A company has data in distinct regions across the globe, each with their own regulation restricting the circulation of data. Up to now, they had to train local models only, but this shows some limits to generalization. This company wants to harness all this data in its original location, but still achieve better results than with training local models.

Solution: They create an AzureML workspace for orchestration. This workspace has multiple compute resources and datastores, each located in a distinct region, within a given trust boundary. Data scientists will use this workspace to run federated learning experiments: the model training happens in parallel on data from region A on a compute from region A, and so on for all the regions they have data in. The model will be transferred back to the region of the orchestrator for aggregation. This will happen iteratively through multiple cycles or training/aggregation until the model converges and will perform better than a model trained in a single region alone.

Use case 2 – Multiple organizations, each having their respective trust boundary both in-cloud or on-prem – Multiple organizations (hospitals, banks) come together as a federation to tackle a common problem (ex: genomics, fraud detection). They hope that by enabling ML training to happen on all their data combined, they'll achieve better models and bring more innovative solutions to their industry.

Solution: The federation creates an AzureML workspace where the data scientists from those organizations will be able to run their jobs as a collaborative team. The workspace will connect with computes hosted and maintained by each organization, some in their respective tenant, some on their own HPC on-premises. The ML training will happen locally within each organization, the model will be sent back to the federation for aggregation and iteration, metrics will be displayed in the workspace for the team to collaborate on.

AzureML SDK v2 to easily write federated learning pipelines

The AzureML SDK v2 provides the foundation to implement a federated learning ML pipeline. In the example below, the pipeline first trains a model on 3 distinct computes and datasets independently (three trust boundaries). Those steps are then aggregated to produce a single model in the orchestrator compute. This process repeats multiple time for convergence.

JeffOmhover_0-1684428160692.png

Because the entire FL pipeline is a regular AzureML experiment, it also integrates with MLFlow for metrics reporting (see below), and unlocks all the usual benefits of AzureML as a platform for experiment management, model deployment and monitoring, etc.

JeffOmhover_0-1685419273422.png

Our FL accelerator repository provides multiple examples of pipeline and training code you can use as a starting point for developing your own:

  • Real training examples for tasks such as medical imaging classification, named entity recognition, credit card fraud detection, marketing, etc,
  • Example implementations for both homogeneous (horizontal) and heterogeneous (vertical) federated learning,
  • Example support for common FL frameworks such as NVFlare,
  • Introduction to Differential Privacy as a technique addressing issues such as data leakage through the model itself.

Heterogeneous infrastructure needs can be met within a single experience

Different scenarios will require different provisioning strategies. But Azure provides enough flexibility to cover many use cases of federated learning: silo compute and data can be in one single Azure tenant, in different tenants through AKS computes, or external to Azure entirely and on-premises through Azure Arc. All those requirements can be met and attached to a single AzureML workspace used as one entry point for a team of data scientists.  Independently of the provisioning setup, the data science experience will remain the same as the team will leverage the AzureML SDK to create and run their experiments.

The example infrastructure schema below is a simple use case where everything fits within a single tenant. The resources for each trust boundary (orchestrator, silos) will be independent and contained from one another by provisioning them with distinct identities, virtual networks, private links etc.

JeffOmhover_0-1684428405740.png

Our FL accelerator repository provides ready-to-deploy sandboxes for your team to get started and evaluate applicability to your specific setup. Our provisioning guide will provide a pick-and-choose approach to designing an infrastructure tailored to your needs.

Your provisioning strategy can also leverage other very complementary Azure capabilities such as confidential computing, where each orchestrator/silo compute is based on confidential , leverage -at-rest and Managed HSM for key management.

Learn more

To stay updated on Azure Machine Learning announcements, watch our breakout sessions from Microsoft Build.

 

This article was originally published by Microsoft's AI - Machine Learning Blog. You can find the original article here.