Build 2024: Azure AI Video Indexer integration with language models for textual video summary

We are thrilled to introduce textual video summarization for recorded video and audio files, powered by large and small language models (LLM and SLM). 

application developers can leverage APIs to create textual summaries for audio and video files, anywhere.

Data analysts, instead of watching entire videos, can benefit from concise summaries of video and audio content and adjust it to their needs. 

insagiv_0-1716224030010.png

Azure Video Indexer, a cloud and edge video solution, enables textual video summarization with the following build announcements:

Preview at the cloud: Textual video summarization in Azure AI Video Indexer powered by Azure Open AI

The feature of textual video summarization in Azure Video Indexer, cloud edition is powered by Azure Open AI. This innovative addition allows customers who have created an AOAI resource in Azure, to seamlessly integrate it with Video Indexer. By leveraging deployments such as GPT4, users can now enjoy concise textual summaries of their videos, presented as an insightful extract alongside the player page. The video summary not only enhances the viewing experience but also empowers video analysts to tailor the summary's nuances and to align with specific business requirements.

The summary that encapsulates the essence of the video content, utilizing not only the transcript but also additional elements derived from the visual and audio aspects of the video like a siren and crowd reactions in the background, or any visual text that appear on the screen like signs, text, visual objects and more.

 

Preview at the edge (on premise): Extend Azure AI Video Indexer enabled by Arc with integration with SLM through Phi3

The preview version of Azure AI Video Indexer enabled by Arc now includes integration with SLM through Phi3. The innovation containerizes both the Azure AI and Phi3 models, providing video analysts the ability to perform video summarization. It represents a significant stride in our generative AI capabilities utilizing the cutting-edge Phi3 model at the edge. The Phi3 model opens new avenues for AI applications, especially in settings where computing resources are limited, by offering a more streamlined and efficient approach to video analysis.

 The Phi3 model, developed in line with Microsoft's Responsible AI principles and trained on high-quality data, is a testament to our dedication to safety and excellence in AI. It's a lightweight, state-of-the-art model designed for long-context support, making it ideal for generating responsive and relevant text in chat formats.

Use cases for video summarization across industries

  • In education, summarized videos can serve as study aids, allowing students to review lecture content quickly. The capability can distill lengthy training videos into key takeaways, saving employees' time and improving knowledge retention, e.g., in corporate trainings.
  • In media, it helps in quickly understanding the content of large video libraries, like movies or series, without watching the entire footage. This can be particularly useful for editors and content creators who need to create promos or trailers.
  • In manufacturing, summarized videos can serve as training material or evidence of compliance with regulatory standards and can quickly highlight parts of footage where potential quality issues are detected on the production line.
  • Retailers can use video summaries to understand customer traffic patterns and preferences without watching hours of footage.
  • In modern safety, textual summaries can pinpoint instances of theft or suspicious behavior, streamlining the review process for security teams, enhance the review process of training exercises, identifying key moments for analysis and improvement.

Watch the demo recording to learn more: 

Video summarization flavors and customization

Video analysts utilizing the summarization feature will appreciate the added flexibility of feature customization. Tailor your summaries to meet specific needs with selectable options such as “Shorter” for concise overviews, “Longer” for detailed accounts, “Formal” for professional contexts, and “Casual” for a more relaxed tone. This personalized approach ensures that your summaries align perfectly with your intended audience and purpose.

insagiv_1-1716222817119.jpeg

How to make it available in my Azure AI Video Indexer account?

Use Textual Video Summarization in Your Public Cloud Environment:

If you already have an existing Azure Video Indexer account, follow these steps to use the video summarization:

  1. Create an Azure Open AI resource in your subscription.
  2. Connect your Azure Open AI resource to your Video Indexer resource in the Azure Portal.
  3. Go to Azure Video Indexer portal, select a video and choose “generate summary”.

For detailed instructions on set up this integration, refer to this guidance . Please note that this feature is not available in Video Indexer trial accounts or on legacy accounts which uses Azure Media services. Leverage this opportunity also to remove your dependency on Azure Media services by following these instructions.

insagiv_2-1716222817130.jpeg

Use Textual Video Summarization in Your Edge Environment, enabled by Arc:

If your edge appliances are integrated with the Azure Platform via Azure Arc, you're in for a treat! Here's activate the feature:

  1. Register for Video Indexer (VI) enabled by Arc using this form. Rest assured, we are dedicated to activating the Azure AI Video Indexer Arc-enabled extension in your Video Indexer account within 30 days of your request. of your request.
  2. Once activated, create an Azure AI Video Indexer service extension by adhering to these guidelines.
  3. Navigate to the Azure Video Indexer portal, select a video, and click on “Generate Summary” to see the magic happen.

insagiv_3-1716222817137.jpeg

Our Video-to-text API (aka Prompt Content API) now also support Llama, Phi2 and GPTv4

The prompt content API, that converts video to text based on video Indexer's extracted insights, now supports additional models: Llama, Phi2 and GPTv4. It provides more flexibility when converting video content to text. To learn more about this API, refer to this API documentation. 

Read More

About the feature

About Azure AI Video Indexer

 

This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.