To start with explaining this topic a small background on what is Activation. This is purely a SQL Service broker term. The messages when they are placed in the queue need to be processed, so as a message is placed SQL SSB has an Activation Stored procedure defined which is launched so that the message can be consumed.
Now many queues do work like the process mentioned above as we see in the table. This approach can be good for queues receiving less messages. We can also set the number of instances of an Activation SP that can be launched given there are as many messages. For Eg: If there are 60 messages coming to the queue at once then instead of processing each one by one we can launch that many SPs at a time. That is defined by MAX_READERS here.
select activation_procedure,name from sys.service_queues
Now to device what would be the MAX_READERS for a queue which receives variable number of messages at times and which also depends on the site usage for the Global data and Site Data queues is very tricky. If we set the number as less then we might have backlog with SQL still be ready to do stuff but we have capped it. And if we set it too high then for infra with limited SQL Perf (CPU and RAM) would be busy only doing this stuff with other stuffs in the back door.
So ConfigMgr PG team decided to not set it as a static number and hence you see the ConfigMgrDRSQueue and ConfigMgrDRSSiteQueue don’t have any Activation stored procedures listed here. So does that mean we don’t have Activation for them?
The answer to that question is that the Activation of the queue is looked upon by the ConfigMgr RCMCtrl component itself. So as mentioned we don’t set a static number for the number of procedures that can be launched but we made it as a function of SQL perf in an infra (number of SQL cores) as to we do not overwhelm or under utilize things.
Now lets try to get to understand the logic with a Case Study.
We found that the Site data groups are failing from CAS. We found that it is not processing any Site data messages.
Launching 64 sprocs on queue ConfigMgrDRSQueue and 91 sprocs on queue ConfigMgrDRSSiteQueue.
Error: Received unhandled exception, printing info and throwing it again. SMS_REPLICATION_CONFIGURATION_MONITOR 6/7/2017 9:04:27 AM 18840 (0x4998)
Error: System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.~~ at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource
1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)~~ at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource1 retry, DbConnectionOptions userOptions)~~ at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource
1 retry)~~ at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource1 retry)~~ at System.Data.SqlClient.SqlConnection.Open()~~ at Microsoft.ConfigurationManager.ReplicationConfigurationAndMonitoring.DrsActivationSprocMethods.LaunchActivationSprocs(Int32 verboseLevel, Int32 queueFetchSize, String queueName, List`1& drsActivationData, Int32 numberOfMessageHandlers, String connectionString, DatabaseOperation databaseOperation)~~ at Microsoft.ConfigurationManager.ReplicationConfigurationAndMonitoring.DrsActivationSprocMethods.ExecuteDrsActivationSprocs(String sqlServerName, String databaseName, Int32 siteNumber, Int32 verboseLevel, Int32 queueFetchSize, Int32 numberOfGlobalMessageHandlers, Int32 numberOfSiteMessageHandlers)~~ at Microsoft.ConfigurationManager.ReplicationConfigurationAndMonitoring.ReplicationConfigureAndMonitor.ExecuteDrsActivationSprocs(String sqlServerName, String databaseName, Int32 siteNumber, Int32 verboseLevel, Int32 queueFetchSize, Int32 numberOfGlobalMessageHandlers, Int32 numberOfSiteMessageHandlers) SMS_REPLICATION_CONFIGURATION_MONITOR 6/7/2017 9:04:27 AM 18840 (0x4998)
Failed to call ExecuteDrsActivationSproc. error = Unknown error 0x80131509 SMS_REPLICATION_CONFIGURATION_MONITOR 6/7/2017 9:04:27 AM 18840 (0x4998)
Now that’s a high number of procedures that are being launched and run.
Previously mentioned that the logic on the number of stored procedures that can be launched is a function of number of CPU cores. The logic itself is coded in spDRSGetNumberOfHandlersToLaunch
In my case the customer’s infra had 120 cores in SQL machine.
Below is the calculation when we run with default settings (spDRSGetNumberOfHandlersToLaunch).
@NumGlobalChannels is basically number of global replication group (ReplicationPattern = Global) links. So this will increase if the number of sites increase.
@NumSiteChannels is basically number of site data replication group (Replication Pattern = Site) links. So this will increase if the number of sites increase.
Here @NumCPUs = 120 , @NumGlobalChannels (for 5 primary Sites on CAS = 85), @NumSiteChannels (for 5 primary sites on Cas = 220). So the below highlighted execution occurs.
IF(@NumGlobalChannels > @NumCPUs)
SET @NumGlobal =(SELECTCONVERT(INT,@NumCPUs*.75)+ 1);
SET @NumGlobal = (SELECT CONVERT(INT,@NumGlobalChannels*.75) + 1); -- <b>85 * 0.75 + 1 = 64</b>
IF(@NumSiteChannels > @NumCPUs)
SET @NumSite = (SELECT CONVERT(INT,@NumCPUs*.75) + 1); -- <b>120 * 0.75 + 1 = 91</b>
SET @NumSite = (SELECT CONVERT(INT,@NumSiteChannels*.75) + 1);
I verified that at the time of Launching the Execution PROCS we do not specify any specific limit in the connection pool in BuildConnectionString(), So by default we can have 100 pooled connections.
But given the fact we are trying to launch more than 100 here we hit the pool limit exception.
To override this default logic we can manually set the Handlers in the registry. When the registry is specifically set then it will not use the default setting of calculating the number of handlers.
Number of global message handler sprocs
Number of site message handler sprocs
We set both of them to decimal 20 for now so that the activation procedures can be launched.
To fix the above issue, From 1706 onwards we are making the Handlers default as 20 by default now and configurable via Site control file.
SELECT ISNULL(SCP.Value3, 20)
FROM SC_Component SC
LEFT JOIN SC_Component_Property SCP ON SCP.ComponentID = SC.ID
AND SCP.Name = N'MaxHandlers'
WHERE SC.ComponentName = N'SMS_REPLICATION_CONFIGURATION_MONITOR'
AND SC.SiteNumber = dbo.fnGetSiteNumber()
Hope this helps!