How to build GenAI powered Smart Knowledge Search platform for enterprises

Generative and large language models (LLMs) are at the heart of innovation and top of mind for all enterprises. Enterprises are looking to leverage OpenAI models capabilities of content generation, summarization, code generation and semantic search to deliver next generation of user experiences and increase productivity of employees.

“Chat with my data” and “Talk to your docs” are common themes and use cases of semantic search that we discuss with our customers. The most significant challenge for customers across industries is to quickly find the most relevant and accurate information from a vast ocean of knowledge available within their enterprise. In this blog, we will share our perspective on how enterprises can leverage Azure Open to develop a platform for smart enterprise knowledge search. We will also discuss other essential components of the platform to build a holistic system which caters for enterprise guardrails.

Why a Platform?

Enterprise knowledge search / Semantic Search use cases leverage Retrieval Augmented Generation (RAG) pattern that augments and provides relevant content to the LLMs aiding to deliver the outcome defined in the user prompt. In enterprises we have observed implementation of the RAG pattern is repeated by various teams often siloed from each other. Instead, enterprises must aim to build a platform which collates knowledge sources, provides conversational experience to access information & knowledge, standardizes the implementation that adheres to organizational  Governance processes & practices.

Platform Tenets

Platform Tenets are the key guiding principles and considerations for defining the technical architecture of the smart enterprise knowledge search platform. Below are few key principles that Platform must deliver to achieve broad adoption, usage and intended value.

  • Reusability & Extensibility
  • Scalability
  • Resiliency
  • AI Governance
  • Data Security
  • Access Control: RBAC and Information access control
  • Cost Transparency

Upcoming sections of the platform architecture will address on how the above principles are achieved as part of the implementation of platform components.

Platform Architecture


Platform Components

The logical architecture view represents the solution components required to build a smart enterprise knowledge search platform which is powered by services such as Azure Open AI, Azure Cognitive Search, AI Content Safety and API management. The components listed below inherently aim to address the platform principles defined in the “Platform Tenets” section.

Content Ingestion and Landing zone

The content landing zone is where enterprise knowledge sources are collated but logically segregated based on organizational boundaries i.e., line of business (LoB) / product offerings / internal org content etc. The logical segregation is achieved using resource groups which is represented for e.g., as “LoB1 RG” in the architecture view. Below are the key functions of this component:

  • Content Ingestion: A reusable pipeline to onboard enterprise knowledge sources into which are further indexed into Cognitive Search using built-in skillsets to translate the documents or using custom skillsets where Document Intelligence is used to extract text and tables from PDFs & images. The extracted text is vectorized using Azure OpenAI embeddings model and is stored in the index. The pipeline must cater for a one-time bulk load of content and cater for auto ingesting incremental content which is either new or revised.
  • Content : Resource groups enable grouping of related Azure resources based on the logical segregation of content required in the enterprise while providing ability to define RBAC and achieve cost transparency. RBAC helps to ensure that only relevant teams / individuals have the requisite access control to their respective content and Azure resources in the resource group for e.g., “LoB1 RG”. Enterprises implement internal chargeback and costs aggregated at the resource group scope helps the platform to charge back the cost based on consumption to the respective teams.

User Interface

AI capabilities are better appreciated when they are delivered directly to end users. Teams provides the best interface for surfacing enterprise unified knowledge search for two reasons: Firstly, Teams is the primary business collaboration platform for most enterprises. Secondly, Teams AI library makes it easy to integrate LLM solutions as a pluggable app in Teams. For cases where Teams integration is not trivial, the solutions should be plugged into existing in-house products or business applications.

APIM Endpoint

API Management service provides an entry point for User Interface (Teams, in-house business applications) to integrate with enterprise knowledge search platform. Each UI App will have unique subscription key to identify the application and forwards to the “Search Orchestrator” component as input to determine the relevant knowledge base harvested in the platform that must be served to address the user prompt.

Metadata store

is used to store the app metadata such as mapping of knowledge sources to the consuming UI App. The metadata is stored separately for each consuming App and must include details of Cognitive Search index, System Prompt (e.g., defining the relevant context, tone, and persona of the AI assistant) and Azure Open AI model deployment endpoints & parameters (e.g., temperature) specific to the consuming App.

Search Orchestrator

The search orchestrator is a key component of the platform which employs AI Orchestrators like Semantic Kernel and Langchain to break down the response flow into tasks, enabling a seamless response from LLM to a user prompt. This component can leverage internal workflows and external skills/plugins for a given prompt. Here are the key tasks performed by the search orchestrator:

  • The Search Orchestrator queries the metadata store using the subscription key from the APIM endpoint to retrieve details of the Cognitive Search Index for searching content based on user prompt.
  • After retrieving the search results, it uses the system prompt definition and model deployment endpoint details from the metadata store associated with the consuming UI App and invokes the “ChatCompletion” API.
  • The completion from the deployed GPT model, along with the original user prompt, is stored in the store to audit the user interaction, which can serve as context for subsequent interactions with the GPT model.
  • Specifically, where information / content level access control is required in enterprises, metadata store can be used to store the definition of content / information level entitlements which is used at runtime by the Search Orchestrator to validate entitlements of the requesting user and filter requests to unauthorized content before making a call to the GPT model.

Content Safety

Each deployed Azure Open AI model in the platform is associated with the content filtering configuration. The platform can have a default content filtering configuration which applies to all knowledge search use cases. However, in specific end customer facing scenarios the content filtering thresholds and severities can be tailored as appropriate which are associated with a specific GPT model deployment.


Enterprise Knowledge search is a common use case across all enterprises and is best addressed by developing a central platform which delivers on the enterprise guardrails and principles in a consistent manner. In this blog we highlighted essential components required to build enterprise knowledge search platform leveraging the relevant Azure services. Before moving into production, an end-to-end LLMOps implementation is necessary, along with measuring metrics like groundedness, informativeness, performance in latency, and cost of API calls for solutions built on such platforms.

In our upcoming blog, we will discuss platform scalability, specifically quotas and limits of Azure Open AI models. This is crucial when building an enterprise-wide platform to ensure the right level of throughput from deployed models for different consuming apps.


This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.