Announcing the Python Custom Skills Toolkit

Are you working with Azure Cognitive Search and need to create custom skills? Are you in need to do some basic operations within your enrichment pipeline but prefer to use Python?

This GitHub repository may be a good help for you. It is a collection of Azure Functions written in Python for special characters removal, dates extraction, filtering, and more.

If you want to know more about Azure Functions for Python in Azure Cognitive Search projects, and why to use it, click here to read our previous blog post about it.

What is this Toolkit?

It is a collection of useful python functions ready to be deployed as Azure Cognitive Search custom skills. The skills can be used as templates or starting points for your own custom skills, or they can be deployed and used as they are if they happen to meet your requirements.

All code was written for Azure Functions in Python 3.7, to address specific projects requirements. The functions are turbo-charged with detailed comments for easy understanding and customization. They also include limitations and restrictions.

The content will help you to prepare your development environment, as well as the tests and the deployment. and lessons learned are also included.

What are Custom Skills?

The Custom Web API skill allows you to extend enrichment by calling out to a Web API endpoint providing custom operations. Like built-in skills, a Custom Web API skill has inputs and outputs. Depending on the inputs, your Web API receives a JSON payload when the indexer runs, and outputs a JSON payload as a response, along with a success status code. To learn more about Custom Skills, click here.


Figure 1: How custom skills are integrated into the reference architecture

What is offered?

Skill When to Use

Dates Extractor

Extracts dates from string. Differentiates itself from the Entity Extraction built-in skill by generating dates in yyyy-mm-dd format.
Strings Merger Merges 2 strings. Differentiates itself from the Text Merger built-in skill by allowing you to merge any 2 strings, not only the content with the OCR text extracted from images.
Strings Cleaner Removes special characters from strings, returning a string clean of those values.
CSV Filter Removes the csv file values from the input, returning a string clean of those values.
CSV Lookup Extracts the csv file values that were found in the input string, returning an array of strings.
Strings Distinct Removes duplicated elements from the input array. Useful when you are extracting entities or key phrases per page, and some values are present in multiple pages.
Bing Entity Search Gets Wikipedia information using the Bing Entity Search API. As an example, url extraction was implemented.
CosmosDb Writter Writes your data into a CosmosDB collection.

Additional Links

  • ACE Team – Python Custom Skills Toolkit – GitHub
  • ACE Team – Knowledge Mining Accelerator –
  • ACE Team – Knowledge Mining Bootcamp – 
  • ACE Team – Knowledge Mining blog posts – 


This article was originally published by Microsoft's AI - Customer Engineering Team Blog. You can find the original article here.