Unlocking Advanced Document Insights with Azure AI Document Intelligence

In the digital age, extracting insights from diverse documents is challenging due to their complex structure. Consider a financial analyst reviewing a company's quarterly report, which includes detailed tables of operating expenses, revenue, and charts depicting sales growth. Traditional document processing solutions often fall short in understanding the nuanced hierarchy of document structures and the contextual relevance of embedded figures, leading to inefficient data extraction, analysis, and utilization. This gap in capability not only hinders efficient data use, but also affects decision making and productivity. As organizations strive to leverage data as a strategic asset, the need for advanced document intelligence solutions that can accurately interpret and analyze the full spectrum of document elements, including the ability to interact with and question the information within figures and charts, has never been more critical.

To revolutionize how we interact with and derive insights from documents, Azure Document Intelligence is introducing groundbreaking features: hierarchical document structure analysis and figure detection.

Hierarchical Document Structure Analysis is crucial for semantically segmenting documents into manageable sections, enhancing overall comprehension, facilitating easier navigation, and significantly improving information retrieval efficiency. The implementation of Retrieval Augmented Generation (RAG) in document generative highlights the importance of such a structured approach. By supporting multi-layers of sections and subsections, the Layout model identifies the relationships between different sections and the objects within each, maintaining a coherent hierarchical structure throughout. This structured output can be conveniently consumed in markdown format, allowing for straightforward access to and manipulation of sections and subsections. The figure demonstrates how sections are organized in the JSON output:

layout-sections.png

Figure 1. Illustration of hierarchical document structure analysis in JSON output

Figures enrich the textual content, offering visual representations that simplify the understanding of complex information. The Layout model‘s figure detection feature comes with key properties such as boundingRegions, which detail the spatial locations of figures across document pages. This includes page numbers and polygon coordinates outlining each figure's boundary. You can use this info to extract the figure or chart and make it an addressable component that can be further processed. Additionally, spans and elements properties link figures to their relevant

textual contexts, making it easier to understand the connection between text and visual data. The presence of a caption property further enhances this by providing descriptive text for each figure, ensuring that users can grasp the full context and significance of visual elements within a document.

layout-figure.png

Figure 2. Illustration of figure detection in JSON output

This sample notebook demonstrates how combining hierarchical document structure analysis and figure detection with the Azure OpenAI GPT-4 Turbo with Vision (GPT-4V) model enables the extraction of advanced insights from documents.

image-understanding.png

Figure 3. Workflow to extract advanced document insights.

The process begins with the identification of different sections of a document, such as text blocks, page objects like tables, and figures. Azure 's sophisticated algorithms analyze the hierarchical structure of the document, ensuring that each section and subsection is accurately identified and that their interrelationships are preserved. This analysis results in the generation of markdown output that reflects the document's structure, facilitating easy navigation and editing.

Next, the workflow showcases crop the detected figures based on their bounding regions, then send both figure body and caption to GPT-4V model for figure understanding. In this example, the GPT-4V model will return the description of the bar chart. This detailed description provides users with a textual representation of the figure's content, which is crucial for understanding the data visually presented in the document.

In the enhanced markdown output, the figure content section has been elevated from merely representing the text detected within the figure to encompassing a comprehensive description provided by GPT-4V. This enriched output now encapsulates a semantic interpretation of the figure, allowing for a more nuanced understanding of the visual data. With this refined information, the markdown output becomes an even more potent asset when applied to the RAG sample notebook, facilitating more precise and contextually aware document-based Q&A interactions.

Azure AI Document Intelligence Studio

Analyze options2.png

  • Click on Run analysis and view the output content, sample code on the right pane:

layout-sample-studio.png

SDK and REST API

 

This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.