Do you need more memory on Azure SQL Managed Instance?

Managed Instance is fully-managed database engine where you can easily choose the size of the instance with the amount of resources you want to use. Sometime, it is hard to determine what instance size you need to use for your workload and one of the criterion might be the amount of memory that you need. This article enables you to check do you have enough memory on your Managed Instance.

Managed Instance has memory that is proportional to the number of cores. As an example, in Gen5 architecture you have 5.1GB of memory per vCore, meaning that 8-core instance will have 41GB memory. Periodically you should check is this amount of memory good for your workload.

Do not monitor does the Managed Instance use ~100% of available memory. Some people believe that this is an issue (like hitting 100% CPU), but this is expected and desired behavior. Managed Instance should use as much as possible memory to cache the pages from disk into the buffer pool. The only case where you will not see near 100% usage of memory is the case where you have the databases much smaller that the available memory size so all of them can fit into the memory.

The better metric for memory usage monitoring is ‘Page Life Expectancy' performance counter monitored on NUMA nodes. Page Life Expectancy indicates how many seconds the pages live in the memory before they are flushed back to disk. the higher value is better, but you need to determine what is the limit.

Some people believe that PLE value should be greater than 300, but this is the old rule created for computers with 4GB memory and criticized by people like Jonathan Kehayias. In the modern architectures with more memory, you should up-scale this limit and find the value proportional to the buffer pool memory. As an example, if you have 8-core Managed Instance ~40GB memory, and if buffer pool uses all 40GB memory (which is highly unlikely because memory should be allocated to other processes and components) your PLE might go up to 3000 instead of 300.

There is an article written by David Williams and Matthew Robertshaw explaining determine limits and calculate minimal page life expectancy using performance counters. The formula they provided is coded in the following T-SQL query: 

bp_mem_gb = (l.cntr_value*8/1024)/1024,
ple = v.cntr_value,
min_ple = (((l.cntr_value*8/1024)/1024)/4)*300
from sys.dm_os_performance_counters v
join sys.dm_os_performance_counters l on v.object_name = l.object_name
where v.counter_name = 'Page Life Expectancy'
and l.counter_name = 'Database pages'
and l.object_name like '%Buffer Node%'

This query will return the current value of PLE, amount of buffer pool memory, and the minimal PLE value calculated based on the buffer pool memory value.

If you run this query on your Managed Instance and you are frequently getting PLE below min PLE value, you might consider upscaling your instance to some higher tier.

NOTE: Try to identify the queries that are using a lot of memory before you add more cores. It is possible that there are some queries that are cause memory pressure or use more memory than usual. If you identify and fix these queries you would not need to up-scale your instance.

You don't need to immediately upgrade service tier and add more memory as soon as you identify this issue.

This parameter is not the exact proof that you need more memory, but it is good indicator and you should investigate it. You should check wait statistics using DMVs like it is shown in the following code:

DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
-- wait some time
select top 10 wait_type, wait_time_ms from sys.dm_os_wait_stats order by wait_time_ms desc

As an alternative, you can use SQL Server Management Studio Performance Dashboard, SentryOne Performance Analysis Dashboard, ApexSQL Monitor or other tools and see are there any PAGIOLATCH wait statistics that can indicate that Managed Instance is waiting for some pages to be loaded.

In addition, you can check memory distribution and see is it possible that there is some other component such as plan cache or columnstore pool that is using more memory and removing the pages from the buffer pool. There are other parameters that you can monitor and you can find more information here.


This article was originally published by Microsoft's Azure Blog. You can find the original article here.