Azure OpenAI Service announces Assistants API, New Models for Finetuning, Text-to-Speech and more


Developers across the world are building innovative generative solutions since the launch of Azure OpenAI Service in January 2023. Over 53,000 customers globally harness the capabilities of expansive generative models, supported by the robust commitments of Azure's cloud and computing infrastructure backed by enterprise grade security.

Today, we are thrilled to announce many new capabilities, models, and pricing improvements within the service. We are launching Assistants API in public preview, new text-to-speech capabilities, upcoming updated models for GPT-4 Turbo preview and GPT-3.5 Turbo, new embeddings models and updates to the fine-tuning API, including a new model, support for continuous fine-tuning, and better pricing. Let's explore our new offerings in detail.

Build sophisticated copilot experiences in your apps with Assistants API

We are excited to announce, Assistants, a new feature in Azure OpenAI Service, is now available in public preview. Assistants API makes it simple for developers to create high quality copilot-like experiences within their own applications. Previously, building custom assistants needed heavy lifting even for experienced developers. While the chat completions API is lightweight and powerful, it is inherently stateless, which means that developers had to manage conversation state and chat threads, tool integrations, retrieval documents and indexes, and execute code manually. Assistants API, as the stateful evolution of the chat completion API, provides a solution for these challenges.

[embedded content]

Building customizable, purpose-built AI that can sift through data, suggest solutions, and automate tasks just got easier. The Assistants API supports persistent and infinitely long threads. This means that as a developer you no longer need to develop thread state management systems and work around a model's context window constraints. Once you create a Thread, you can simply append new messages to it as users respond. Assistants can access files in several formats – either while creating an Assistant or as part of Threads. Assistants can also access multiple tools in parallel, as needed. These tools include:

  • Code Interpreter: This Azure OpenAI Service-hosted tool lets you write and run Python code in a sandboxed environment. Use cases include solving challenging code and math problems iteratively, performing advanced data analysis over user-added files in multiple formats and generating data visualization like charts and graphs.
  • Function calling: You can describe functions of your app or external APIs to your Assistant and have the model intelligently decide when to invoke those functions and incorporate the function response in its messages.

Support for new features, including an improved knowledge retrieval tool, is coming soon.

Assistants playground on start building with the API.

As with the rest of our offerings, data and files provided by you to the Azure OpenAI Service are not used to improve OpenAI models or any Microsoft or third-party products or services, and developers can delete the data as per their needs. Learn more about data, privacy and security for Azure OpenAI Service here. We recommend using Assistants with trusted data sources. Retrieving untrusted data using Function calling, Code Interpreter with file input, and Assistant Threads functionalities could compromise the security of your Assistant, or the application that uses the Assistant. Learn about mitigation approaches here.

Fine-tuning: New model support, new capabilities, and lower prices

Since we announced Azure OpenAI Service fine-tuning for OpenAI's Babbage-002, Davinci-002 and GPT-35-Turbo on October 16, 2023, we've enabled AI builders to build custom models. Today we're releasing fine-tuning support for OpenAI's GPT-35-Turbo 1106, a next gen GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Fine-tuning with GPT-35-Turbo 1106 supports 16k context length in training data, allowing you to fine-tune with longer messages and generate longer and more coherent texts.

In addition, we are introducing two new features to enable you to create more complex custom models and easily update them. First, we are launching support for fine-tuning with function calling that enables you to teach your custom model when to make function calls and improve the accuracy and consistency of the responses. Second, we are launching support for continuous fine-tuning, which allows you to train a previously fine-tuned model with new data, without losing the previous knowledge and performance of the model. This lets you add additional training data to an existing custom model without starting from scratch and lets you experiment more iteratively.

Besides new model support and features, we are making it more affordable for you to train and host your fine-tuned models on Azure OpenAI Service, including decreasing the cost of training and hosting GPT-35-Turbo by 50%.

Coming soon: New models and model updates

The following models and model updates are coming this month to Azure OpenAI Service. You can review the latest model availability here.

Updated GPT-4 Turbo preview and GPT-3.5 Turbo models

We are rolling out an updated GPT-4 Turbo preview model, gpt-4-0125-preview, with improvements in tasks such as code generation and reduced cases of “laziness” where the model doesn't complete a task. The new model fixes a bug impacting non-English UTF-8 generations. Post-launch, we'll begin updating Azure OpenAI deployments that use GPT-4 version 1106-preview to use version 0125-preview. The update will start two weeks after the launch date and complete within a week. Because version 0125-preview offers improved capabilities, customers may notice some changes in the model behavior and compatibility after the upgrade. Pricing for gpt-4-0125-preview will be same as pricing for gpt-4-1106-preview.

In addition to the updated GPT-4 Turbo, we will also be launching GPT-3.5-turbo-0125, a new GPT-3.5 Turbo model with improved pricing and higher accuracy at responding in various formats. We will reduce input prices for the new model by 50% to $0.0005 /1K tokens and output prices by 25% to $0.0015 /1K tokens.

New Text-to-Speech (TTS) models

Our new text-to-speech model generates human-quality speech from text in six preset voices, each with its own personality and style. The two model variants include tts-1, the standard voices model variant, which is optimized for real-time use cases, and tts-1-hd, the high-definition (HD) equivalent, which is optimized for quality. This new includes capabilities such as building custom voices and avatars already available in Azure AI and enables customers to build entirely new experiences across customer support, training videos, live-streaming and more. Developers can now access these voices through both services, Azure OpenAI Service and Azure AI Speech.

A new generation of embeddings models with lower pricing

Azure OpenAI Service customers have been incorporating embeddings models in their applications to personalize, recommend and search content. We are excited to announce a new generation of embeddings models that are significantly more capable and meet a variety of customer needs. These models will be available later this month.

  • text-embedding-3-small is a new smaller and highly efficient embeddings model that provides stronger performance compared to its predecessor text-embedding-ada-002. Given its efficiency, pricing for this model is $0.00002 per 1k tokens, a 5x price reduction compared to that of text-embedding-ada-002. We are not deprecating text-embedding-ada-002 so you can continue using the previous generation model, if needed.
  • text-embedding-3-large is our new best performing embeddings model that creates embeddings with up to 3072 dimensions. This large embeddings model is priced at $0.00013 / 1k tokens.

Both embeddings models offer native support for shortening embeddings (i.e. remove numbers from the end of the sequence) without the embedding losing its concept-representing properties. This allows you to make trade-off between the performance and cost of using embeddings.

What’s Next

It has been great to see what developers have built already using Azure OpenAI Service. You can further accelerate your enterprise's AI transformation with the products we announced today. Explore the following resources to get started or learn more about Azure OpenAI Service.

We cannot wait to see what you build next!


This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.