Elevating RAG and Search: The Synergy of Azure AI Document Intelligence and Azure OpenAI

In our previous blog post: Document Generative AI: the Power of Azure AI Document Intelligence & Azure OpenAI Service Combined, we introduced what Document Generative is and how you can use Azure Document Intelligence (formerly known as Azure Form Recognizer) and Azure OpenAI service to enable chat on a variety of enterprise long documents.

Retrieval Augmented Generation (RAG) is a design pattern that is commonly used in Document Generative (for an example, see the repo here). It is an architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system that provides the data. Adding an information retrieval system gives you control over the data used by an LLM when it formulates a response. Enterprise documents are usually long and complex, though LLM can take in more context recently, a good chunking strategy is still required to divide them into smaller pieces that can be more efficient in and retrieval, as well as enhancing the relevance and interpretability of the results. However, most chunking strategy in RAG today is still based on text length without much consideration on document structure. There's a high demand for semantic chunking – so how do you divide a large body of texts or documents into smaller, meaningful chunks based on semantic content rather than arbitrary splits?

Semantic chunking in RAG.png

The Azure AI Document Intelligence Layout model offers a comprehensive solution for semantic chunking by providing advanced content extraction and document structure analysis capabilities. With this model, you can easily extract paragraphs, tables, titles, section headings, selection marks, font/style, key-value pairs, math formulas, QR code/barcode and more from various document types. The extracted information can be conveniently outputted to markdown format, enabling you to define your semantic chunking strategy based on the provided building blocks.

Benefits of using the Layout Model:

  • Simplified process: You can parse different document types, such as digital and scanned PDFs, images, office files (docx, xlsx, pptx), and html, with just a single API call.
  • Scalability and AI quality: The model is highly scalable in Optical Character Recognition (OCR), table extraction, document structure analysis (e.g., paragraphs, titles, section headings), and reading order detection, ensuring high-quality results driven by AI capabilities. It supports 309 printed and 12 handwritten languages.
  • LLM compatibility: The output format is LLM friendly, specifically markdown, which facilitates seamless integration into your workflows. You can turn any table in a document into markdown format, which will save lots of effort parsing the documents to make LLM better understand them.

Layout demo.png

Figure 1 Layout model can detect document structures and output to markdown.

Table.png

Figure 2 Layout model can extract tables from your document.

 

Getting started

Azure AI Document Intelligence Studio

Analyze options.png

  • Click on Run analysis and view the output, sample code on the right pane:

Layout analyze.png

 

SDK and REST API

Build “chat with your document” with semantic chunking

  • This cookbook shows a simple demo for RAG pattern with Azure AI Document Intelligence as document loader and Azure Search as retriever in Langchain.
  • This solution accelerator demonstrates an end-to-end baseline RAG pattern sample that uses Azure AI Search as a retriever and Azure AI Document Intelligence for document loading and semantic chunking.

Learn more

 

This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.