Model training and Fine Tuning with serverless compute

We are happy to announce the General Availability of Model Training with Serverless Compute.

Serverless compute is a fully-managed, on-demand compute target for a simplified way of running training jobs in Azure . Through serverless compute, (ML) professionals can focus on their expertise in building ML models, rather than learning about compute infrastructure. Serverless compute also reduces the management burden on IT admins by managing the compute infrastructure and providing managed isolation, while still meeting the most stringent enterprise security requirements. All Azure Machine Learning job types are supported, including generative scenarios such as fine-tuning, evaluations, and retrieval augmented generation (RAG) for large language models.

vijetaj_0-1699998650789.png

Advantages of serverless compute

  • Azure Machine Learning manages creating, setting up, scaling, deleting, and patching for compute infrastructure, reducing management overhead on IT admins
  • No need for enterprises to perform repetitive processes to create compute using the same settings for each workspace
  • Simplifies the job submission experience by reducing the steps involved to run a job
  • ML professionals don't need to learn about compute concepts, various compute types, or related properties and instead can just focus on the job specification
  • Dynamic defaulting of VM size needed to run the training job
  • Meets the most stringent enterprise security requirements by providing support for No public IP compute, private link workspaces, customer virtual , managed virtual , managed identity, and user identity. Admin control through quota and Azure policies.
  • Enterprises can optimize costs by specifying the exact resources each job needs at runtime. Utilization metrics of the job can be monitored to optimize the resources a job would need. Low-priority VMs are also supported.
  • Elastic training support in case of quota, low-priority, and fault tolerance scenarios
  • Reduced wait times before jobs start executing in some cases

Get Started

 

This article was originally published by Microsoft's AI - Machine Learning Blog. You can find the original article here.