Conversational Bots 2.0 – Setting a new paradigm

The evolution of chatbots is transforming user interactions. Powered by advanced Azure , these multi-modal bots can process and respond to various inputs like text, images, and voice. They offer enhanced support and seamless navigation, making them invaluable for improving user experiences. For instance, they can efficiently by analysing error images, navigate users with visual and voice guidance, and provide tailored recommendations through combined text and image recognition.

This article delves into the workings of multi-modal chatbots, their benefits for businesses, and the steps to create one using Azure AI. It also highlights successful implementations, such as MakeMyTrip's Myra, which has significantly improved customer satisfaction, and PhysicsWallah's GyanGuru bot, which is revolutionizing educational support for students.


As exciting as the possibilities seem, it's essential to understand the underlying architecture that makes multi-modal AI chatbots tick. This scenario leverages the collaborative strength of Azure OpenAI and other Azure Cognitive Services, each playing a crucial role. We will also investigate a few additional Azure offerings that make this solution more scalable, reliable, safe, and maintainable.

  1. Azure OpenAI:
    1. Language Understanding: Utilizes advanced AI like GPT-4 Turbo for text analysis and intent detection.
    2. Dialogue Generation: Creates context-aware, conversational responses.
    3. Image Understanding: Employs GPT-4 Vision to interpret user-uploaded images for key details.
    4. Assistants: Create intelligent AI assistants tailored to your needs through custom instructions and augmented by advanced tools like code interpreter, and custom functions.
  2. Azure Cognitive Services:
    1. Azure Speech: Enables voice-based interactions, converting spoken commands and questions to text for processing and responding with audio outputs.
    2. Azure AI Translator: Enables a multi-lingual interface by allowing users to interact with the AI bot in over 100 languages.
    3. Azure AI Search: Powers intelligent search experiences, combining text queries with image recognition to personalize product recommendations or answer visual search queries.
  3. Integration and Orchestration: The magic happens in the orchestration layer. This layer seamlessly binds various AI services together, ensuring smooth information flow and context awareness. Tools like Azure Logic Apps and Prompt Flow facilitate data exchange and trigger relevant actions based on user inputs and outputs.
  4. Other services:
    1. Azure AI Content Safety: Keeps your interactions with the LLM safe and adherent to policies. It allows administrators to control both input and output tokens in terms of appropriateness.
    2. Evaluation Flow: These are a special class of Prompt Flows that can be used to test the quality of LLM responses at scale.

Multimodal bot high-level architectureMultimodal bot high-level architecture

Business Impact

Multi-modal AI chatbots hold significant business value:

  1. Enhanced Customer Service: Offer 24/7 support, answer complex questions, and resolve issues efficiently, increasing customer satisfaction and loyalty.
  2. Improved User Experience: Guide users through websites and apps intuitively, reducing friction and making interactions smoother.
  3. Personalized Recommendations: Leverage image recognition and text analysis to suggest relevant products or services, boosting sales and conversions.
  4. Reduced Operational Costs: Automate repetitive tasks like answering FAQs or directing users to relevant information, freeing up human agents for more complex issues.


Customer Stories


PhysicsWallah – Gyan Guru bot

The Gyan Guru bot by Physics Wallah is an AI-powered educational tool designed to enhance the learning experience for students. It's a part of the Alakh AI suite, which serves as a 24/7 personal AI tutor and assistant. The bot is tailored to address a variety of student queries, whether they're academic, non-academic, product-related, or support-related.

Key Features

  1. AI Tutoring: Customized academic help.
  2. Query Handling: Manages all types of student questions.
  3. Product Guidance: Assists with Physics Wallah's offerings.
  4. Always Available: Support at any time.
  5. Tailored Learning: Caters to individual student needs.
  6. All-in-One Support: Comprehensive educational assistance.

Overall, the Gyan Guru bot aims to revolutionize educational support, making it more accessible, personalized, and efficient for students preparing for various competitive exams.

Our experience with Microsoft has been phenomenal. The onboarding process was seamless and the team at Microsoft has been incredibly supportive throughout the process. We believe this is just the beginning and our collaboration with Microsoft will help further our mission to enhance the learning experience of millions of our students by leveraging GenAI. We're excited about the possibilities ahead.” – Sandeep Penmetsa, Head of Data Science, PhysicsWallah


MakeMyTrip – Myra bot

MakeMyTrip's Myra bot is an AI-powered chatbot that enhances the travel booking experience for customers. It's a part of MakeMyTrip's push towards becoming a super app for travel services.

Key Components:

  1. AI Chat Interface: Offers real-time alerts and suggestions for flight/rail bookings, baggage details, and car bookings.
  2. Voice Recognition: Enables ticket bookings through voice commands.
  3. Travel Technology: Aims to provide a comprehensive range of transport offerings and accommodations.

Value Proposition:

  1. Efficiency: Streamlines the booking process with AI-driven suggestions and alerts.
  2. Convenience: Allows voice-based bookings, adding ease to the booking experience.
  3. Comprehensive Service: Aspires to be a one-stop-shop for all travel needs, from transport to accommodations.

Myra represents MakeMyTrip's commitment to leveraging AI to simplify and improve the travel planning and booking process for its customers.

“We looked at several options that are out there today, and we chose Azure because of the entire bouquet of services on Azure Cognitive Services – be it voice-to-text, text-to-voice or rich vernacular language capability. In order to provide a smooth, seamless experience to our customers, we elected to partner with the one that had the most comprehensive set of services to build an end-to-end capability. Azure Cognitive Services fit the bill completely. Additionally, there are benefits to collaborating with a trusted partner like Microsoft, who add yet another layer of assurance with their “responsible AI” mindset.”  – Sanjay Mohan, Group CTO, MakeMyTrip.


This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.