I think it is very common resident knowledge with experts who work on the cases involving Native SQL replication with ConfigMgr DRS (Be it MP Replica DB, or just plain out of box SQL Replication usage in ConfigMgr infra)
“The Configuration works well until it breaks in way so bad that DRS is non-functional with poison messages all over”
While we have tried fixing these issue on a case to case basis, there needs to be more analysis on why these things happen and what can do to identify, prevent or Remediate things.
And an inherent question – What can we do from ConfigMgr end so we don’t allow customer to shoot on the foot? And well, that’s the whole purpose of this post. To try putting an end to this debate and this uncertain discussion with customer to install DB Replica –
“We know we can go with MP DB Replica, but things can break the replication sometimes”
So lets start, How does the issue surface or comes to us –
Issue Description
We are(were) using MP DB Replica Or out of box SQL replication on ConfigMgr DB successfully until after some changes were made and this broke out ConfigMgr replication.
We see exceptions in RCMCtrl.log for every message it tries to process –
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | The asynchronous command finished with return message: [A .NET Framework error occurred during execution of user-defined routine or aggregate "spDRSActivation": ~~Microsoft.ConfigurationManager.DataReplicationService.ServiceException: Exception has been thrown by the target of an invocation. ---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.Data.SqlClient.SqlException: Explicit value must be specified for identity column in table 'Logs' either when IDENTITY_INSERT is set to ON or when a replication user is inserting into a NOT FOR REPLICATION identity column ~~System.Data.SqlClient.SqlException: ~~ at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) ~~ at System.Data.SqlClient.SqlCommand.RunExecuteNonQuerySmi(Boolean sendToPipe) ~~ at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite) ~~ at System.Data.SqlClient.SqlCommand.ExecuteNonQuery() ~~ at Microsoft.ConfigurationManager.DataReplicationService.DrsLogging.ExecuteLogEntryProcedure(LogType logType, String logText, String messageText, String procedureName) ~~ at Microsoft.ConfigurationManager.DataReplicationService.MessageHandlerService.ProcessSyncEnd(Message msgReceived, SqlConnection connection, SqlTransaction transaction, Int32 logLevel, SqlCommand command) ~~System.Reflection.TargetInvocationException: ~~ at Microsoft.ConfigurationManager.DataReplicationService.Service.Run(SqlConnection connection, SqlTransaction transaction, Int32 logLevel, SqlCommand command, Int64 poisonMessageSequenceNumber) |
And this happens Pretty much for most of the tables throwing Poison messages.
Lets try to analyze the exception
1 2 | Explicit value must be specified for identity column in table 'Logs' either when IDENTITY_INSERT is set to ON or when a replication user is inserting into a NOT FOR REPLICATION identity column |
Basically a lot of things to understand before we even know what is that trying to entail. So lets start tearing piece by piece.
Piece 1 : Identity column
These are the columns in the table which generate the value automatically for a row and we don’t explicitly provide any value.
For example, Take a look at Logs tables LogLine column.
So Logline is an Identity column starting with value 1 and will automatically increment by 1.
When we Insert into LOGS table we don’t provide any explicit value to this as SQL Server automatically handles this for us. See below we did not pass any value to the Logline column.
1 2 | insert into Logs(SPID,ComponentName,MachineName,ProcedureName,LogText,NestLevel,MessageText) values(@@SPID,APP_NAME(),HOST_NAME(),@ProcedureName,@LogText,@@NESTLEVEL,@MessageText) |
These would definitely create unique rows and we generally keep Identity columns on tables which seemingly don’t have a Good Primary Key to keep.
Piece 2: Thanks! I know Identity Column, But what if I manually want to specify a value for Identity column
You can try your luck but SQL server doesn’t like it and will raise an error if you try to specify a VALUE manually –
1 | Cannot insert explicit value for identity column in table 'MyTable' when IDENTITY_INSERT is set to OFF |
Ok! So is there a workaround?
Yeah Something like below where you explicitly ALLOW INSERT for IDENTITY column and then turn it OFF.
1 2 3 4 5 6 | SET IDENTITY_INSERT MyTable ON INSERT MyTable(TheIdentity, TheValue) VALUES (3, 'First Row') SET IDENTITY_INSERT MyTable OFF |
Piece 3: Thinking of Practical scenario now of Identity Table (Table with an Identity column) in SQL replication.
Now if we have SQL Replicas, If a Table to be replicated happens to be a IdentityTable, we don’t want the Identity column (Say suppose DistributionID) to be different in Replica.
So we want to explicitly specify the same value what we have in the Main Publishing DB and want to make sure that we are able to explicitly ENTER a VALUE to the Identity column.
If we think of the above workaround, that’s not acceptable to me as I cannot modify each Stored Procedures etc. as that will break my production infra where I don’t specify any value to Identity column and let it takes it course.
So the SQL provided us with a way to achieve the same for Replication scenarios. And the answer is ‘NOT FOR REPLICATION’ bit in the identity column.
So once we have ‘NOT FOR REPLICATION bit set for the column in the Subscriber DB (Replica DB), It is the same as saying IDENTITY_INSERT is SET to ON for the table permanently and you need to specify the value for the IDENTITY Column explicitly For Insert coming from Replication Endpoints.
This would mean Any insert on the Subscriber DB coming from Replication process will be able to insert the same manual value that got automatically inserted in the publisher DB in the Identity Column without any Ifs or Buts.
Back to Square one on the track of troubleshooting the issue.
Now lets try to analyze the exception again that we are getting on the Production Publisher DB.
1 2 | Explicit value must be specified for identity column in table 'Logs' either when IDENTITY_INSERT is set to ON or when a replication user is inserting into a NOT FOR REPLICATION identity column |
So looks like we are trying to update the Logs table without specifying the Identity column (which is perfectly fine) as we looked above as by default the Logs table is not marked for ‘NOT FOR REPLICATION’ = YES
So what happened which changed this to setting to 1 on the Publisher DB.
It was mentioned that there were only some changes to the default publication on the Publisher.
When I looked at the Publication I could see Logs table was made to be a part of the subscription. Pretty much all tables were selected for replication.
Whereas when you look the at default configuration we have only 76 tables selected with Identity columns. Logs and all most other tables are not included by default.
For ConfigMgr , We store the objects we replicate to REPLICA here:
1 | SELECT * from ReplicatedObjects |
From my Labs snapshot, I could see and confirm Logs is not included by default.
So now I think we are sure someone did modify the subscription to include other tables that are not included by default. Well a fair thing you expect from admins needing more tables and data as per their convenience.
But the question is-
Why did we change the NFR bit to ON in the Publication DB (Production Site DB)given our SPs are not coded that way?
All in all, NFR bit for Identity column makes sense for Subscriber DB which needs to get the value explicitly from the publisher and we still want the Automatic seeding to happen in Production DB.
So I went ahead and selected the LOGS table in my subscription manually. And to my horror the Not for Replication bit for the Identity Column in Logs table changed to TRUE in the Publisher DB.
And in no time I started seeing the exception in my environment as well.
So this explains why we used to say don’t touch the setup once configured 🙂
But at CSS, customers come for fixing the issue and what we have been doing is setting NFR bit to OFF for such tables.
Also, Can we do anything to help identify this issue from an HMAN monitoring rule, and correct it if possible?
Yes! In my opinion 🙂
Here is the detection script to find any additional articles added to the publication.
1 2 3 4 5 6 | SELECT T.Name FROM sys.tables T INNER JOIN sys.columns C on T.object_id = C.object_id LEFT JOIN ReplicatedObjects R ON R.objectName = T.name WHERE C.is_identity =1 AND C.is_replicated =1 AND R.ObjectName is NULL AND COLUMNPROPERTY( OBJECT_ID(T.Name),C.Name,'IsIdNotForRepl') = 1 |
And here is the Remediation Script to correct these for the tables setting the NFR bit to 0.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | IF OBJECT_ID('tempdb..#NFRTEMP') IS NOT NULL DROP TABLE #NFRTEMP SELECT T.Name INTO #NFRTEMP FROM sys.tables T INNER JOIN sys.columns C on T.object_id = C.object_id LEFT JOIN ReplicatedObjects R ON R.objectName = T.name WHERE C.is_identity =1 AND C.is_replicated =1 AND R.ObjectName is NULL AND COLUMNPROPERTY( OBJECT_ID(T.Name),C.Name,'IsIdNotForRepl') = 1 IF EXISTS (SELECT 1 from #NFRTEMP) BEGIN PRINT 'Affected by Custom modification of Publication [ConfigMgr_MPReplica] to include Tables with Identity Columns not included by Default in ConfigMgr...' DECLARE @TableName NVARCHAR(255); DECLARE @ConfigMgrDB NVARCHAR(255);DECLARE @DistDB NVARCHAR(255);DECLARE @SQLCMD NVARCHAR(MAX); Select @ConfigMgrDB = ConfigMgrDatabase from ServerData where Sitecode = (select ThisSiteCode from SMSData) select @DistDB=Name from sys.databases where is_distributor = 1 and Name like 'CM%' DECLARE @ObjectID int DECLARE RemoveNonDefaultArticlesFromPublicationAndResetNFR CURSOR FOR SELECT A.Name FROM #NFRTEMP AS A OPEN RemoveNonDefaultArticlesFromPublicationAndResetNFR; FETCH NEXT FROM RemoveNonDefaultArticlesFromPublicationAndResetNFR INTO @TableName; WHILE @@FETCH_STATUS = 0 BEGIN -- Setting the NFR BIT to OFF on the additional tables should be enuf to fix the issue if we still want to keep those tables in Publication. -- Uncomment the below section only if you also want to remove the added Articles from ConfigMgr Publication /* PRINT 'Dropping the Table '+ @TableName + ' From Publication [ConfigMgr_MPReplica] as it is not included by default' SET @SQLCMD = 'USE ' + @DistDB +'; delete from mssubscriptions where article_id =(select article_id from msarticles with (nolock) where article ='+ ''''+ @TableName+ ''''+')'; EXECUTE(@SQLCMD); EXEC sp_droparticle @publication = 'ConfigMgr_MPReplica', @article = @TableName, @force_invalidate_snapshot = 1 */ PRINT 'Setting the NFR bit for Identity Column on Table '+ @TableName + ' to OFF' SET @ObjectID =object_id(@TableName) EXEC sys.sp_identitycolumnforreplication @ObjectID, 0 FETCH NEXT FROM RemoveNonDefaultArticlesFromPublicationAndResetNFR INTO @TableName; END; CLOSE RemoveNonDefaultArticlesFromPublicationAndResetNFR; DEALLOCATE RemoveNonDefaultArticlesFromPublicationAndResetNFR; END ELSE PRINT 'Publication [ConfigMgr_MPReplica] is Fine with no Custom Modifications. Exiting...' |
Things are not as simple as you wish they were
This all looks great if you had kept the publications without touching it further. But one customer did realize his mistake that he selected all tables accidentally. And then went ahead and unchecked the tables from publication.
Now a bad thing to know is un-checking does not RESET the NFR bit to 0 in the Publication DB. So we are stuck with the same issue now but a challenge more difficult to identify the tables who have this Problem.
To Solve this problem, there is a logical approach.
1. Disable NFR bit on identity columns for all tables in the Site Database (Publication DB)
1 | EXEC sp_msforeachtable @command1 = 'declare @int int set @int =object_id("?") EXEC sys.sp_identitycolumnforreplication @int, 0' |
2. Run the below query on a healthy ConfigMgr infra (same version) to create a set of statements which will tell us the tables that have NFR bit set to 1
1 2 3 4 | SELECT 'set @int =object_id('''+ o.name +''') EXEC sys.sp_identitycolumnforreplication @int, 1' as sqlcmd from sys.objects o inner join sys.columns c on o.object_id = c.object_id WHERE COLUMNPROPERTY(c.object_id, c.name, 'IsIdNotForRepl') = 1 |
3. Copy the output from the healthy Database and run these statements on the Site Database which is broken.
So hopefully this will help you with fix the issue where SQL Replication is involved.
Umair Khan
Support Escalation Engineer |Microsoft System Center Configuration Manager
Disclaimer: This posting is provided “AS IS” with no warranties and confers no rights.