Fine Tuning: now available with Azure OpenAI Service

Get excited: fine tuning is now available for GPT-3.5-Turbo, Babbage-002, and Davinci-002 in Public Preview! This update lets developers customize their favorite OpenAI models with their own data and easily deploy their new custom models, all within a super easy to use managed service. We launched Azure OpenAI Service in January, and it's been amazing to watch developers bring the power of generative to their applications; today marks a new chapter in this journey – we're making it possible to customize models using your data, to solve your problems.

In this blog we'll talk about…

  • What's new with Azure OpenAI Service  – new models and capabilities.
  • Why developers like you are fine tuning their models, and some tips and tricks for success along the way.
  • How you can get started today with Azure OpenAI Service or Azure Machine Learning
     

What's new with Azure OpenAI Service?

Today, we're launching two new base inference models (Babbage-002 and Davinci-002) and fine-tuning capabilities for three models (Babbage-002, Davinci-002, and GPT-3.5-Turbo).

New models: Babbage-002 and Davinci-002 are GPT3 base models, intended for completion use cases. They can generate natural language or code, but they're not trained for instruction following. Babbage-002 replaces the deprecated Ada and Babbage models, while Davinci-002 replaces Curie and Davinci. These models support the completion API.

Fine tuning: You'll now be able to use Azure OpenAI Service, or Azure Machine Learning, to fine tune Babbage/Davinci-002 and GPT-3.5-Turbo. Babbage-002 and Davinci-002 support completion, while Turbo supports conversational interactions. You'll be able to specify your base model, provide your data, train, and deploy – all with a few commands.

Fine-tuning-infographic.jpg

Tell me more about fine tuning!

Fine tuning is one of the methods available to developers and data scientists looking to customize large language models for specific tasks. While approaches like Retrieval Augmented Generation (RAG) and prompt engineering work by injecting the right information and instructions into your prompt, fine tuning operates by customizing the large language model itself.

 Azure OpenAI Service & Azure Machine Learning offer Supervised Fine Tuning, which allows you to provide custom data (prompt/completion or conversational chat, depending on the model) to teach the base model new skills.

Think of fine tuning as an “expert mode” feature: super powerful, but requiring a solid foundation built on the basics. Fine tuning can make good models better , but you need appropriate use case, high quality data, and the right models and prompts to succeed.

What fine tuning means for developers like you

Before you start fine tuning, we recommend starting with prompt engineering or RAG (Retrieval Augmented Generation) to develop a baseline – it's the fastest way to get started, and we make it easy with tools like Prompt Flow or On Your Data.  Starting with prompt engineering and RAG will provide a baseline you can compare against in scenarios where you do need to fine tune a model. Most fine-tuned models in production will incorporate both prompt engineering and fine tuning so no effort is wasted!

Need help deciding when (or if) you should be fine tuning? A few ground rules can help guide you!

Don't start with fine tuning if:

  • You want a simple and fast result: fine tuning is going to take a lot of data and time to train and evaluate your new model. If you're short on time, you can usually get pretty far with just prompt engineering!
  • You need up-to-date or out of domain data: this is a perfect use case for RAG and Prompt Engineering!
  • You want to make sure your model is well grounded and avoid hallucinations:  this is another area where RAG shines!

Consider fine tuning if:

  • You want to teach the model a new skill so it's good at one specific task like classification, summarization, or always responding in a specific format or tone. Sometimes you can fine tune a smaller model to perform just as well at a specific task as a bigger model!
  • You want to show the model do something with examples, where it's too hard to explain in the prompt – or there are too many examples to fit in the context window. These are scenarios with lots of edge cases, like natural language to query, or teaching a model to speak in a specific voice or tone.
  • You want to reduce latency. Long prompts can take longer to process, and fine tuning lets you move those long prompts into the model itself.
     

Getting started with fine tuning on Azure

Fine Tuning with Azure Open Service gets you the best of both worlds: the ability to customize advanced OpenAI LLMs, while deploying on Azure's secure, enterprise ready cloud services. One of the risks of fine tuning is inadvertently introducing harmful data into your model; our content moderation allows you to fine tune with the data you need, while still filtering out any harmful responses.

If you're new to Azure OpenAI Service and LLMs: welcome!  We offer a super simple API to train and deploy your models – or if you're more comfortable with a GUI, try out Azure OpenAI Studio. If you're migrating to Azure from OpenAI, our APIs are compatible!

There are two parts to fine tuning: training your fine-tuned model and using your newly customized model for inference.

Training:  Specify your base model, your training and validation data, and set any hyperparameters – and you're ready to go! You can use the Azure OpenAI Studio for a simple GUI, or more advanced users can use our REST APIs or the OpenAI Python SDK.

FT_deploy_AOAI_Studio_small.gif

When you've finished fine tuning, your completed job will return evaluation metrics like training and validation loss.

We offer fine tuning as a managed service, so you don't have to worry about managing compute resources or capacity. When you submit a job, all you'll be paying for is the active training time, billed in 15 minute intervals, for successful fine-tuning runs. The price depends on the base model you've selected; Babbage-002 is $34/hour, Davinci-002 is $68/hour, and Turbo is $102/hour.

Inference in Azure OpenAI Service: When the training job has succeeded, your new model will be available within your resource. When you're ready to start using your model for inferencing, your customized model can be deployed just like any other OpenAI LLM!

FT_deploy_AOAI_Studio.gif

Fine tuned models are subject to an hourly hosting charge, as well as token based pricing for input and output data:

Model Hourly Hosting Input tokens Output Tokens
Babbage-002 $1.70 $0.0004 / 1k $0.0004 / 1k
Davinci-002 $3.00 $0.0020 / 1k $0.0020 / 1k
GPT-35-Turbo $7.00 $0.0015 / 1k $0.0020 / 1k

If you don't need to use your model right away, there's no charge for storing trained models.

Fine tuning is also available in Azure Machine Learning!

If you're already familiar with Azure Machine Learning Studio for developing, monitoring and deploying models, you can integrate fine tuning into your existing models thereAML workflows!  You can learn more about that experience in this blog. OpenAI models are available in the model catalog, as part of our comprehensive portfolio.  Along with OpenAI models, Azure Machine Learning also supports fine tuning OSS models, like LLaMa.

Ready to get started?

Want to learn more?

 

This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.