Using Cohere Binary Embeddings in Azure AI Search and Command R/R+ Model via Azure AI Studio

In April 2024, we proudly announced our partnership with Cohere, allowing customers to seamlessly leverage Cohere models via the Azure AI Studio Model Catalog, as part of the Models as a Service (MaaS) offering. At Build 2024, Azure Search launched support for Binary Vectors. In this blog, we are excited to continue from our previous discussion on int8 embeddings and highlight two powerful new capabilities: utilizing Cohere Binary Embeddings in Azure Search for optimized and search, and employing the Cohere Command R+ model as a Large Language Model (LLM) for Retrieval-Augmented Generation (RAG). 

Cohere Binary Embeddings via Azure AI Studio

Binary vector embeddings use a single bit per dimension, making them much more compact than vectors using floats or int8, while still yielding surprisingly good quality given the size reduction. Cohere's binary embeddings offer substantial efficiency, enabling you to store and search vast datasets more cost-effectively. This capability can achieve significant memory reduction, allowing more vectors to fit within Azure Search units or enabling the use of lower SKUs, thus improving cost efficiency and supporting larger indexes.

“Cohere's binary embeddings available in Azure AI Search provide a powerful combination of memory efficiency and search quality, ideal for advanced AI applications.” – Nils Reimers, Cohere's Director of Machine Learning.

With int8 and binary embeddings, customers can achieve up to a 32x reduction in vector size under optimal conditions, translating to improved cost efficiency and the ability to handle larger datasets. Read the full announcement from Cohere here: Cohere int8 & binary Embeddings – Scale Your Vector Database to Large Datasets

Cohere Command R+ Model for RAG

The Cohere Command R+ model is a state-of-the-art language model that can be used for Retrieval-Augmented Generation (RAG). This approach combines retrieval of relevant documents with the generation capabilities of the model, resulting in more accurate and contextually relevant responses.

Step-by-Step Guide

Here's how you can use Cohere Binary Embeddings and the Command R model via Azure AI Studio:

Install Required Libraries

First, install the necessary libraries, including the Azure Search Python SDK and Cohere Python SDK.

pip install --pre azure-search-documents
pip install azure-identity cohere python-dotenv


Set Up Cohere and Azure AI Search Credentials

Set up your credentials for both Cohere and Azure AI Search. For this walkthrough, we'll use Cohere Deployed Models in Azure AI Studio. However, you can also use the Cohere API directly. 

import os
import cohere
from azure.core.credentials import AzureKeyCredential
from azure.identity import DefaultAzureCredential
from import SearchClient
from import SearchIndexClient
from import SearchIndex, SearchField, SimpleField, VectorSearch, VectorSearchProfile, HnswAlgorithmConfiguration, HnswParameters, VectorEncodingFormat, VectorSearchAlgorithmKind, VectorSearchAlgorithmMetric, AzureMachineLearningVectorizer, AzureMachineLearningParameters
from dotenv import load_dotenv


# Azure AI Studio Cohere Configuration
# Index Names
INT8_INDEX_NAME = "cohere-embed-v3-int8"
BINARY_INDEX_NAME = "cohere-embed-v3-binary"
# Azure Search Service Configuration
# Create a Cohere client using the AZURE_AI_STUDIO_COHERE_API_KEY and AZURE_AI_STUDIO_COHERE_ENDPOINT from Azure AI Studio
cohere_azure_client = cohere.Client(

Generate Embeddings using Azure AI Studio

Use the Cohere Embed API via Azure AI Studio to generate binary and int8 embeddings for your documents.

def generate_embeddings(texts, input_type="search_document", embedding_type="ubinary"):
    model = "embed-english-v3.0"
    texts = [texts] if isinstance(texts, str) else texts
    response = cohere_azure_client.embed(
    return [embedding for embedding in getattr(response.embeddings, embedding_type)]

# Example usage
documents = ["Alan Turing was a pioneering computer scientist.", "Marie Curie was a groundbreaking physicist and chemist."]
binary_embeddings = generate_embeddings(documents, embedding_type="ubinary")
int8_embeddings = generate_embeddings(documents, embedding_type="int8")

Create an Azure AI Search Index

Create an Azure AI Search index to store the embeddings. Note, that Azure AI Search only supports unsigned binary at this time.

def create_or_update_index(client, index_name, vector_field_type, scoring_uri, authentication_key, model_name):
    fields = [
        SimpleField(name="id", type=SearchFieldDataType.String, key=True),
        SearchField(name="text", type=SearchFieldDataType.String, searchable=True),
                VectorEncodingFormat.PACKED_BIT if vector_field_type == "Collection(Edm.Byte)" else None

    vector_search = VectorSearch(
        profiles=[VectorSearchProfile(name="my-vector-config", algorithm_configuration_name="my-hnsw")],
        algorithms=[HnswAlgorithmConfiguration(name="my-hnsw", kind=VectorSearchAlgorithmKind.HNSW, parameters=HnswParameters(metric=VectorSearchAlgorithmMetric.COSINE if vector_field_type == "Collection(Edm.SByte)" else VectorSearchAlgorithmMetric.HAMMING))]

    index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)

# Example usage
search_index_client = SearchIndexClient(endpoint=search_service_endpoint, credential=credential)
create_or_update_index(search_index_client, "binary-embedding-index", "Collection(Edm.Byte)", AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT, AZURE_AI_STUDIO_COHERE_EMBED_KEY, "embed-english-v3.0")
create_or_update_index(search_index_client, "int8-embedding-index", "Collection(Edm.SByte)", AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT, AZURE_AI_STUDIO_COHERE_EMBED_KEY, "embed-english-v3.0")


Index Documents and Embeddings

Index the documents along with their embeddings into Azure AI Search.

def index_documents(search_client, documents, embeddings):
    documents_to_index = [{"id": str(idx), "text": doc, "embedding": emb} for idx, (doc, emb) in enumerate(zip(documents, embeddings))]

# Example usage
search_client_binary = SearchClient(endpoint=search_service_endpoint, index_name="binary-embedding-index", credential=credential)
search_client_int8 = SearchClient(endpoint=search_service_endpoint, index_name="int8-embedding-index", credential=credential)
index_documents(search_client_binary, documents, binary_embeddings)
index_documents(search_client_int8, documents, int8_embeddings)


Perform a Vector Search

Use the Azure AI Search client to perform a vector search using the generated embeddings. 

def perform_vector_search(search_client, query, embedding_type="ubinary"):
    query_embeddings = generate_embeddings(query, input_type="search_query", embedding_type=embedding_type)
    vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=3, fields="embedding")
    results =, vector_queries=[vector_query])
    for result in results:
        print(f"Text: {result['text']}")
        print(f"Score: {result['@search.score']}n")

# Example usage
perform_vector_search(search_client_binary, "pioneers in computer science", embedding_type="ubinary")
perform_vector_search(search_client_int8, "pioneers in computer science", embedding_type="int8")
Int8 Results:
Title: Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.
Score: 0.6225287

Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.
Score: 0.5917698

Title: Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.
Score: 0.5746157

Binary Results:
Title: Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.
Score: 0.002610966

Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.
Score: 0.0024509805

Title: Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.
Score: 0.0023980816


Ground the Results to Cohere Command R+ for RAG

Use the Cohere Command R+ model to generate a response based on the retrieved documents.

# Create a Cohere client for Command R+
co_chat = cohere.Client(

# Extract the documents from the search results
documents_binary = [{"text": result["text"]} for result in results_binary]

# Ground the documents from the "binary" index
chat_response_binary =
    message=query, documents=documents_binary, max_tokens=100

Binary Results:
There are many foundational figures who have made significant contributions to the field of computer science. Here are some of the most notable individuals:

1. Alan Turing: Often considered the "father of computer science," Alan Turing was a British mathematician and computer scientist who made groundbreaking contributions to computing, cryptography, and artificial intelligence. He is widely known for his work on the Turing machine, a theoretical device that served as a model for modern computers, and for his crucial role in breaking German Enigma codes during World War II.

2. Albert Einstein: Known for his theory of relativity and contributions to quantum mechanics, Albert Einstein was a German-born physicist whose work had a profound impact on the development of modern physics. His famous equation, E=mc^2, has become one of the most well-known scientific formulas in history.

3. Isaac Newton: An English mathematician, physicist, and astronomer, Isaac Newton is widely recognized for his laws of motion and universal gravitation. His work laid the foundation for classical mechanics and significantly advanced the study of optics and calculus.


Full Notebook

Find the full notebook with all the code and examples here.

Getting Started

  • Azure AI Search Documentation:

  • Cohere Documentation:

  • Additional Resources:

By integrating Cohere Binary Embeddings and the Command R/R+ model into your Azure AI workflow, you can significantly enhance the performance and scalability of your AI applications, providing faster, more efficient, and contextually relevant results.


This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.