maljazaery

Unleashing PTU Token Throughput with KV-Cache-Friendly Prompt on Azure

1- Introduction PTUs are reserved processing capacity, ensuring stable performance for uniform LLM workloads. The reserved capacity of PTUs makes KV caching more effective compared to Pay-As-You-Go (PayGo). This blog post delves into the role of Key-Value (KV) caching in enhancing PTU throughput, and practical strategies to create cache-friendly prompts that maximize efficiency. 2- What […]

Unleashing PTU Token Throughput with KV-Cache-Friendly Prompt on Azure Continue Reading

Evaluate Small Language Models for RAG using Azure Prompt Flow (LLama3 vs Phi3)

Introduction: Recently, small language models have made significant progress in terms of quality and context size. These advancements have enabled new possibilities, making it increasingly viable to leverage these models for retrieval-augmented generation (RAG) use cases. Particularly in scenarios where cost sensitivity is a key consideration, small language models offer an attractive alternative.   This post

Evaluate Small Language Models for RAG using Azure Prompt Flow (LLama3 vs Phi3) Continue Reading

Evaluating Small Language Models for RAG using Azure Prompt Flow (LLama3 vs Phi3)

Introduction: Recently, small language models have made significant progress in terms of quality and context size. These advancements have enabled new possibilities, making it increasingly viable to leverage these models for retrieval-augmented generation (RAG) use cases. Particularly in scenarios where cost sensitivity is a key consideration, small language models offer an attractive alternative.   This post

Evaluating Small Language Models for RAG using Azure Prompt Flow (LLama3 vs Phi3) Continue Reading