Identify log write limits on Azure SQL Managed Instance using QPI library

Managed Instance is fully managed SQL Server instance hosted in Azure cloud. Managed Instance introduces some limits such as max log write throughput that can slow down your workload. In this post you will see identify write log throughput issue on Managed Instance.

Managed Instance has several built-in resource limits such as max log write rate. The reason for introducing this log write limit is the necessity to ensure that log backups can catch-up incoming data.

In this post, I'm using QPI library to easily analyze wait statistics on Managed Instance. To install QPI library go to the installation section and download the SQL script for your version of SQL Server (it supports /SQL + because it depends on Query Store views).

Disclaimer: QPI library is open source library provided as-is and not maintained by Microsoft. There are not guarantees that the results are correct and that there are not bugs in calculations. This is a helper library that can help you to more easily analyze performance of your Managed Instance, but you can do the same job by looking directly at DMVs.

With this library, I can easily take a snapshot of wait statistics, wait some time and read the wait statistics values:

exec qpi.snapshot_wait_stats;

waitfor delay '00:00:03';

select *
from qpi.wait_stats
order by wait_time_s desc;

Example of the result returned by this query is shown below:

In this example, you can see that the tasks on Managed Instance are waiting on INSTANCE_LOG_RATE_GOVERNOR wait type (with a link to the description of the wait type). You can go further and identify the queries that are causing these as it is described here: issues https://blogs.msdn.microsoft.com/sqlserverstorageengine/2019/03/05/analyzing-wait-statistics-on-managed-instance/

As an alternative, you can analyze IO performance on the Managed Instance to identify bottlenecks using the following procedure/view:

  • qpi.snapshot_file_stats procedure will take a snapshot of io statistics from sys.dm_io_virtual_file_stats DM function. You MUST take the snapshot because sys.dm_io_virtual_file_stats contains cumulative information and you need to calculate sample in the recent time interval.
  • qpi.file_stats view will get the file statistics since the last snapshot. This view contains several calculations such as IOPS, throughput based on a data from sys.dm_io_virtual_file_stats.

The following query will summarize write throughput (MB/s) and IOPS on the instance categorized per file types (LOG/DATA):

You can see in the result that my instance is using 47.5 MB/s writing in log file and 1226 IOPS in total. If you look at the description of  resource limits in Azure documentation, you will see that ~48MB/s is log rate limit that I'm hitting and this is the reason why I see dominant INSTANCE_LOG_RATE_GOVERNOR wait statistic.

This analysis tells you that you are running some log-write heavy operation that uses most of the write log – in this case I'm running several REBUILD INDEX operations.

 

This article was originally published by Microsoft's Azure Blog. You can find the original article here.