How Azure is ensuring the future of GPUs is confidential

In Microsoft Azure, we are continually innovating to enhance security. One such pioneering effort is our collaboration with our hardware partners to create a new foundation based on silicon, that enables new levels of data protection through the protection of data in memory using confidential computing.

a man using a laptop computer

Azure confidential computing

Increase data privacy by protecting data in use.

Data exists in three stages in its lifecycle: in use (when it is created and computed upon), at rest (when stored), and in transit (when moved). Customers today already take measures to protect their data at rest and in transit with existing encryption technologies. However, they have not had the means to protect their data in use at scale. Confidential computing is the missing third stage in protecting data when in use via hardware-based trusted execution environments (TEEs) that can now provide assurance that the data is protected during its entire lifecycle.

The Confidential Computing Consortium (CCC), which Microsoft co-founded in September 2019, defines confidential computing as the protection of data in use via hardware-based TEEs. These TEEs prevent unauthorized access or modification of applications and data during computation, thereby always protecting data. The TEEs are a trusted environment providing assurance of data integrity, data confidentiality, and code integrity. Attestation and a hardware-based root of trust are key components of this technology, providing evidence of the system's integrity and protecting against unauthorized access, including from administrators, operators, and hackers.

Confidential computing can be seen as a foundational defense in-depth capability for workloads who prefer an extra level of assurance for their cloud workloads. Confidential computing can also aid in enabling new scenarios such as verifiable cloud computing, secure multi-party computation, or running data analytics on sensitive data sets.

While confidential computing has recently been available for central processing units (CPUs), it has also been needed for graphics processing units (GPU)-based scenarios that require high-performance computing and parallel processing, such as 3D graphics and visualization, scientific simulation and modeling, and and machine learning. Confidential computing can be applied to the GPU scenarios above for use cases that involve processing sensitive data and code on the cloud, such as healthcare, finance, government, and education. Azure has been working closely with NVIDIA® for several years to bring confidential to GPUs. And this is why, at Microsoft Ignite 2023, we announced Azure confidential VMs with NVIDIA H100-PCIe Tensor Core GPUs in preview. These Virtual Machines, along with the increasing number of Azure confidential computing (ACC) services, will allow more innovations that use sensitive and restricted data in the public cloud.

Potential use cases

Confidential computing on GPUs can unlock use cases that deal with highly restricted datasets and where there is a need to protect the model. An example use case can be seen with scientific simulation and modeling where confidential computing can enable researchers to run simulations and models on sensitive data, such as genomic data, climate data, or nuclear data, without exposing the data or the code (including model weights) to unauthorized parties. This can facilitate scientific collaboration and innovation while preserving data privacy and security.

Another possible use case for confidential computing applied to image generation is medical image analysis. Confidential computing can enable healthcare professionals to use advanced image processing techniques, such as deep learning, to analyze medical images, such as X-rays, CT scans, or MRI scans, without exposing the sensitive patient data or the proprietary algorithms to unauthorized parties. This can improve the accuracy and efficiency of diagnosis and treatment, while preserving data privacy and security. For example, confidential computing can help detect tumors, fractures, or anomalies in medical images.

Given the massive potential of AI, confidential AI is the term we use to represent a set of hardware-based technologies that provide cryptographically verifiable protection of data and models throughout their lifecycle, including when data and models are in use. Confidential AI addresses several scenarios spanning the AI lifecycle.

  • Confidential inferencing. Enables verifiable protection of model IP while simultaneously protecting inferencing requests and responses from the model developer, service operations and the cloud provider.
  • Confidential multi-party computation. Organizations can collaborate to train and run inferences on models without ever exposing their models or data to each other, and enforcing policies on how the outcomes are shared between the participants.
  • Confidential training. With confidential training, models builders can ensure that model weights and intermediate data such as checkpoints and gradient updates exchanged between nodes during training aren't visible outside of TEEs. Confidential AI can enhance the security and privacy of AI inferencing by allowing data and models to be processed in an state, preventing unauthorized access or leakage of sensitive information.

Confidential computing building blocks

In response to growing global demands for data security and privacy, a robust platform with confidential computing capabilities is essential. It begins with innovative hardware as part of its core foundation and incorporating core infrastructure service layers with Virtual Machines and . This is a crucial step towards allowing services to transition to confidential AI. Over the next few years, these building blocks will enable a confidential GPU ecosystem of applications and AI models.

Confidential Virtual Machines

Confidential Virtual Machines are a type of virtual machine that provides robust security by encrypting data in use, ensuring that your sensitive data remains private and secure even while being processed. Azure was the first major cloud to offer confidential Virtual Machines powered by AMD SEV-SNP based CPUs with memory encryption that protects data while processing and meets the Confidential Computing Consortium (CCC) standard for data protection at the level.

Confidential Virtual Machines powered by Intel® TDX offer foundational virtual machines-level protection of data in use and are now broadly available through the DCe and ECe virtual machines. These virtual machines enable seamless onboarding of applications with no code changes required and come with the added benefit of increased performance due to the 4th Gen Intel® Xeon® Scalable processors they run on. 

Confidential GPUs are an extension of confidential virtual machines, which are already available in Azure. Azure is the first and only cloud provider offering confidential virtual machines with 4th Gen AMD EPYC™ processors with SEV-SNP technology and NVIDIA H100 Tensor Core GPUs in our NCC H100 v5 series virtual machines. Data is protected throughout its processing due to the and verifiable connection between the CPU and the GPU, coupled with memory protection mechanism for both the CPU and GPU. This ensures that the data is protected throughout processing and only seen as cipher text from outside the CPU and GPU memory.

Confidential containers

Container support for confidential AI scenarios is crucial as provide modularity, accelerate the development/deployment cycle, and offer a lightweight and portable solution that minimizes virtualization overhead, making it easier to deploy and manage AI/machine learning workloads.

Azure has made innovations to bring confidential containers for CPU-based workloads:

  • To reduce the infrastructure management on organizations, Azure offers serverless confidential containers in Azure Container Instances (ACI). By managing the infrastructure on behalf of organizations, serverless containers provide a low barrier to entry for burstable CPU-based AI workloads combined with strong data privacy-protective assurances, including container group-level isolation and the same memory powered by AMD SEV-SNP technology. 
  • To meet various customer needs, Azure now also has confidential containers in Azure Kubernetes Service (AKS), where organizations can leverage pod-level isolation and security policies to protect their container workloads, while also benefiting from the cloud-native standards built within the Kubernetes community. Specifically, this solution leverages investment in the open source Kata Confidential Containers project, a growing community with investments from all of our hardware partners including AMD, Intel, and now NVIDIA, too.

These innovations will need to be extended to confidential AI scenarios on GPUs over time.

The road ahead

Innovation in hardware takes time to mature and replace existing infrastructure. We're dedicated to integrating confidential computing capabilities across Azure, including all shop keeping units (SKUs) and container services, aiming for a seamless experience. This includes data-in-use protection for confidential GPU workloads extending to more of our data and AI services.

Eventually confidential computing will become the norm, with pervasive memory encryption across Azure's infrastructure, enabling organizations to verify data protection in the cloud throughout the entire data lifecycle.

Learn about all of the Azure confidential computing updates from Microsoft Ignite 2023.

 

This article was originally published by Microsoft's Azure Blog. You can find the original article here.