Demystifying Azure Open AI for App developers

Co-Authors (Prakash, Prabhjot)

The purpose of this blog is to cover the concepts related to Azure Open in an easy-to-understand concise format for anyone with no or limited ML background.

Let's begin by understanding the fundamental components of Azure OpenAI solutions, their tools, and patterns, and explore how they are distinct from Azure OpenAI itself.

Open is an independent research organization focused on artificial intelligence () which in addition to research also develops various GPT (Generative pre-trained) models like GPT-4, GPT-4V, DALL-E 3, Whisper. Primary uses case for GPT models natural language processing tasks, language translation, text summarization, and Q&A. These models can be used with Enterprise data and additional domain specific models using various patterns and techniques.

Azure Open AI refers to the collaboration between OpenAI and Microsoft Azure. Under this partnership, OpenAI's AI models and technologies are hosted by Microsoft in Azure, making them accessible to developers and organizations through the Azure platform.  The Azure OpenAI service automatically encrypts any data that persists in the cloud, including training data and fine-tuned models. This helps protect the data and ensures that it meets organizational security and compliance requirements. Although Azure OpenAI is designed to meet data protection, privacy, and security standards, it is the your responsibility to use the technology in compliance with applicable laws and regulations and in a manner that aligns with their specific business needs.

  • Pricing: Azure OpenAI and OpenAI have different pricing policies.
  • Regional availability: Azure OpenAI is available in multiple regions.
  • Tokens: Azure OpenAI limits the number of tokens
  • Data Safety: Data submitted to the Azure OpenAI Service including prompts (Inputs), completions (Outputs), embeddings and any training data remains within Microsoft Azure Customer Subscription and is not used by Azure OpenAI or passed to OpenAI for model improvements or training / predictions. No data is shared between customers.
  • Capabilities: Azure OpenAI provides a safe and reliable ecosystem to safeguard your data, while OpenAI provides advanced language AI models like OpenAI GPT or DALL-E.
  • Data Retention: Microsoft retains only the Abuse monitoring data for 30 days. Customers can request to opt out of the process.
  • Responsible AI: Azure Open AI goes through an RAI ensemble of AI models to filter Inputs and outputs for Sex, Hate, Violence, Self-Harm. These filters are configurable by customers.

Components of Azure open AI

Azure OpenAI offers a ready-to-use service with finely tuned capabilities, accessible via an API (Model as a Service). The key assets contributing to the Generative AI solution include LLMs, agents, plugins, prompts, chains, and APIs.

Fundamentals of Utilizing Azure OpenAI:

  • Prompting: The models operate on a prompt-based system. Interaction with the Model/API is conducted through prompts, and crafting an effective prompt is crucial, known as prompt engineering, to enhance relevance and precision.
  • Grounding: This technique provides the Model with context to yield more pertinent responses. Grounding can be achieved through various methods, such as embedding, to provide the necessary background information.
  • Chunking: This process divides extensive documents into smaller segments manageable by embedding models, ensuring adherence to maximum token input limits. These segments populate vector stores and facilitate text-to-vector query transformations.
  • Fine-Tuning: In instances where prompt engineering does not yield accurate responses or domain-specific behavior is required, fine-tuning re-trains the LLM with sample data to optimize it for datasets.
  • Tokens: Tokenization is the process of breaking text into smaller segments called tokens which can be a word or part of a wordfor few characters. Its usage and size depend on the model you are using.

We query Azure Open AI using Prompts (fig 1). Prompt has three core components.

  1. System/meta prompt
  2. Question/Query
  3. Sources/Context


LangChain and Semantic Kernel have some similarities, but each one has their unique features and use cases.


LangChain is modular and supports both Python and JavaScript/TypeScript. It streamlines development by breaking down complex tasks into a sequence of components. LangChain offers a versatile framework for developing applications that involve natural language processing (NLP) tasks. Its modular nature and support for both Python and JavaScript/TypeScript indicate flexibility in development environments. Breaking down complex tasks into manageable components like Model I/O, Retrieval, Chains, Agents, Memory, and Response simplifies the development process and allows for easier debugging and maintenance.

The use of Chains to construct sequences of calls suggests a workflow-oriented approach, where developers can organize tasks into a logical sequence. Agents add another layer of abstraction by enabling chains to choose tools based on high-level directives, potentially increasing adaptability and efficiency.

The inclusion of Memory for persisting application state between runs of a chain indicates support for stateful processing, which can be crucial for certain types of applications where context needs to be maintained across interactions.

Overall, LangChain appears to be a good tool for building applications that involve NLP tasks, offering modularity, flexibility, and support for different programming languages. Its components provide developers with a structured approach to developing complex applications while streamlining the development process.

Sematic Kernel

Semantic Kernel is an open-source SDK (software development kits) that simplifies the process of constructing agents that can activate your existing code. It is a highly adaptable SDK that is compatible with models from OpenAI, Azure OpenAI, Hugging Face, and beyond. By merging your existing C#, Python, and Java code with these models, you can create agents that are proficient in responding to questions and automating tasks.

Empowering Developers with Semantic Kernel:

  • To assist developers in crafting their own Copilot experiences using AI plugins, we have unveiled Semantic Kernel, a streamlined open-source SDK that orchestrates your existing code (plugins) with AI.
  • Harness the same AI orchestration techniques that drive Microsoft's Copilots in your applications.

 Beyond Simple Chat Applications: While modern AI models are adept at generating messages and images, constructing fully autonomous AI agents that can automate business operations and enhance user productivity requires more. A framework that can interpret model responses and utilize them to trigger existing code is essential for productive tasks.

Semantic Kernel fulfills this need by providing an SDK that enables you to describe your existing code to AI models, allowing them to request its execution. Semantic Kernel then converts the model's response into an actionable call to your code.

To summarize, LangChain is a powerful framework that has more out of the box tools and integrations whereas Semantic Kernel is more lightweight. Both frameworks have a wide range of use cases, making them versatile tools for developers. Whether you choose Langchain or Semantic Kernel will depend on the language your team supports and what features and integrations are included out of the box.

Samples: semantic-kernel/dotnet/samples at main · microsoft/semantic-kernel (   

Vectors and Embeddings

Vector representation is to capture the essential characteristics of an item in a numerical format.  Embedding is a special type of vector of data representation that LLMs can use.

A vector database is a system engineered to house and handle vector embeddings, which are numerical representations of complex data within a multi-dimensional space. Each dimension in this space is associated with a particular attribute of the data, and sophisticated data can be represented using tens of thousands of dimensions. The position of a vector within this space signifies its distinct characteristics. Various types of data, including words, phrases, documents, images, and audio, can be converted into vector form. These embeddings are crucial for functions such as similarity searches, multi-modal searches, recommendation systems, and large language models (LLMs), among others.

In a vector database, embeddings are indexed and queried through vector search algorithms based on their vector distance or similarity.

The following are some of the Vector Databases:

  1. AI Search: Azure AI Search stores the data that you query over. Use it as a pure vector store anytime you need long-term memory or a knowledge base, or grounding data for Retrieval Augmented Generation (RAG) architecture, or any app that uses vectors.
  2. PostgreSQL with vector extension
  3. Pinecone 
  4. Any open source

Retrieval augmented generation (RAG) is an essential element of utilizing Generative AI, particularly in enterprise contexts. This approach involves acquiring domain-specific knowledge and integrating it with the initial prompt (refer to figure 2) to enhance the precision and relevance of the results produced by Azure Open AI. The ‘Bring Your Own Data' feature is a unique capability that facilitates the implementation of RAG, and Azure AI studio simplifies its application for straightforward scenarios.


Responsible AI (RAI)

Built in features in Azure Open AI studio: Azure Open AI goes through an RAI ensemble of AI models to filter Inputs and outputs for Sex, Hate, Violence, Self-Harm. These filters are configurable by customers.


RAI Toolbox Github repository 

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user …

Tools: The tools which can be used to develop Azure Open AI based solutions

Small Language Model (SLM)


Compact language models, such as Microsoft's Phi and those from various providers, possess capabilities akin to larger Generative AI models but require significantly fewer resources. They can operate on any Nvidia-based hardware, allowing for the deployment of Small Language Models (SLMs) in diverse settings. SLMs demonstrate considerable proficiency in areas like common sense reasoning, language comprehension, and knowledge. However, they may not match the larger models in terms of world knowledge due to their size constraints.

Azure OpenAI Approved as a Service within the FedRAMP High Authorization for Azure Commercial

Microsoft's Azure OpenAI service is now included within the US Federal Risk and Authorization Management Program (FedRAMP) High Authorization for Azure Commercial. This Provisional Authorization to Operate (P-ATO) within the existing FedRAMP High Azure Commercial environment was approved by the FedRAMP Joint Authorization Board (JAB). This milestone follows our previously announced solution enabling Azure Government customers to access Azure OpenAI Service in the commercial environment. With this latest update, agencies requiring FedRAMP High can directly access Azure OpenAI from Azure commercial.


Challenges faced by early adopters are being addressed through ongoing efforts. Utilizing patterns and approaches such as APIM or AI landing zones can mitigate some issues:

Model Updates: Frequent modifications to the underlying Large Language Models (LLMs) can pose operational challenges.

Multilingual Scenarios: In applications supporting multiple languages, the accuracy of responses may decline, with LLMs potentially delivering mixed-language content.

Performance, HA/DR: Ensuring consistent performance in production applications that use Open AI can be challenging, with possible increased latency.

Secure Sensitive Information: To secure sensitive data, enterprises must work closely with their Office of Responsible AI during the project qualification stage, especially for sensitive AI use cases. Strict adherence to their advice on managing sensitive or explicit content is imperative. Organizations are required to follow established security principles and apply data classification labels, known as sensitivity labels, to protect documents, emails, PDFs, Teams meetings, and chats.

Cost Management: A strategy employed by customers involves using an orchestrator to determine which GPT model to invoke based on the query. Not all queries necessitate GPT-4; many can be adequately addressed with GPT-3.5, thus managing costs effectively.

In conclusion, this article will help to quickly understand the opportunities and the landscape of enabling Azure Open AI in your applications.


This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.