GPT-4 Turbo with Vision on Azure OpenAI Service

We are thrilled to announce that GPT-4 Turbo with Vision on Azure OpenAI service is coming soon to public preview. GPT-4 Turbo with Vision is a large multimodal model (LMM) developed by OpenAI that can analyze images and provide textual responses to questions about them. It incorporates both natural language processing and visual understanding. This integration allows Azure users to benefit from Azure's reliable cloud infrastructure and OpenAI's advanced research.

Beyond Words: Unveil the Power of Visual Understanding

Previously, language models have operated with a singular focus on text input, which has placed restrictions on their application across different contexts. GPT-4 Turbo with Vision breaks through these barriers by incorporating visual data, enabling an advanced level of image understanding. This model is not just about recognizing objects in a picture; it's about understanding the context and details—such as creating elaborate image captions, providing rich contextual descriptions, responding to inquiries about visual content, or assigning intelligent tags. GPT-4 Turbo with Vision elevates data interpretation to new heights, interpreting the visual world in ways that extend well beyond mere pixels.

Instacart, a grocery technology and service company, has developed a search feature called Ask Instacart which lets customers ask open ended natural language questions about food. Now, with GPT-4 Turbo with Vision on Azure OpenAI Service, Instacart is upgrading the Ask Instacart to support additional Vision capabilities.

GPT-4 Turbo with Vision on Azure OpenAI Service makes it possible to convert handwritten recipes and shopping lists directly into digital, shoppable item lists in the Instacart app. Our end users no longer need to try to decipher ingredients and quantities or manually having to search for each item they need and add it to their Instacart orders. This is only the beginning of how we anticipate leveraging this technology, and we see the potential for dramatically improving the speed and quality of some of our customer and shopper workflows.

JJ Zhuang, Chief Architect, Instacart

GPT-4 Turbo with Vision + Azure Services

GPT-4 Turbo with Vision on Azure OpenAI Service offers cutting-edge capabilities along with enterprise-grade security and responsible AI governance. In addition, it provides exclusive access to Azure AI Services tailored enhancements. When combined with Azure AI Services, it enhances your experience by introducing an array of advanced functionalities, including:

Video prompt: We are enabling developers to use video as an input for GPT-4 Turbo with Vision through the native integration of Azure AI Vision Video Retrieval. This simplifies the process of incorporating video input into applications, eliminating the need for complex video processing code. This integration enables the retrieval of context for video prompting through advanced multi-modal vector indexing of vision and speech and, allows the generation of summaries and answers about video content.

For more information on Video prompts, see Video Retrieval: GPT-4 Turbo with Vision Integrates with Azure to Redefine Video Understanding.

Sataliais the AI hub for WPP, one of the world's largest communications services groups, known primarily for its work in advertising and public relations.Satalia's collaboration with Microsoft leverages GPT-4 Turbo with Vision on Azure OpenAI Service and Azure AI Vision to creatively transformcontent analysis and optimization. These technologies enable the deep evaluation and optimization of video content, such as advertisements and social media posts, offering profound insights into content effectiveness and audience engagement.

The detailed summaries of video created by GPT-4 Turbo with Vision on Azure OpenAI Service with Video Retrieval enable Satalia's AI tool to predict the impact of video content and suggest improvements, aligning with audience expectations and platform specifics. This fusion of AI and human creativity ensures that content is not only visually appealing but also resonates emotionally.

We have been experimenting with a wide range of image-to-text and video-to-text tools over the past two years to equip our AI solutions with the capability to analyze and produce more effective creative assets through decoding video in ways never thought possible.

I can safely say that GPT-4 Turbo with Vision on Azure OpenAI Service is by far the best tool that we have worked with, as it offers perfect perception of both visual content and context.

Daniel Hulme, CEO of Satalia, a WPP Company

Azure OpenAI on your data with images: By combining GPT-4 Turbo with Vision, Azure AI Search, and Azure AI Vision, we are transforming information retrieval. Now, you can add your images to text data and utilize vector search to develop a solution that connects with your data, enabling an improved chat experience. This multimodal support builds upon the existing Bring Your Data functionality for text-based models.

Object grounding: Azure AI Vision complements GPT-4 Turbo with Vision's text response with object grounding and outlines salient objects in the input images. This integration brings a new layer to data analysis and user interaction, as the feature can visually distinguish and highlight important elements in the images it processes.


Optical Character Recognition (OCR): Azure AI Vision complements GPT-4 Turbo with Vision by providing high-quality OCR results as supplementary information to the model. It allows the model to produce higher quality responses for dense text, transformed images, and numbers-heavy financial documents, and increases OCR language coverage.



Responsible AI + Privacy

Microsoft is committed to the advancement of AI driven by responsible principles. GPT-4 Turbo with Vision on Azure OpenAI Service respects users' privacy. When processing images, or inputs containing images of people, the system will first blur faces before processing to return the requested results, thereby preventing identification of individuals through their face. Any identification, when it occurs, is based on the model's training, which associates specific images with names tagged during its learning phase. The model can also take contextual cues other than the face. This is how the model can still associate an image with an individual even if the face is blurred. For example, if the image contains a photo of a popular athlete wearing their team's jersey with their specific number, the model can still infer who the individual is based on these contextual cues.

The upcoming introduction of GPT-4 Turbo with Vision on Azure OpenAI Service represents our ongoing commitment to expanding the capabilities of AI and providing our users with the most innovative tools in the market. We are excited to see how our customers will leverage this new functionality to advance their businesses and drive innovation.

We look forward to enabling your enterprise to take advantage of these capabilities as we continue to push the boundaries of what is possible with AI.

Get started with Azure AI Service today


This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.