Hello Everyone, my name is Zoheb Shaikh and I’m a Solution Engineer working with Microsoft Mission Critical team (SMC). I’ll share with you something interesting that I came across recently where one of our Enterprise customers had multiple clients where new GPO’s were not getting applied and none of the clients were able to access the Domain Root share.
Before I talk more about the issue, I would like to share a bit of background about Group Policy and how it is structured.
As a Windows administrator, you almost certainly have used Group Policies to control the settings deployed to the clients of your Active Directory infrastructure.
Group Policy Objects are composed of two parts, the Group Policy Container (GPC) which exists in Active Directory and the Group Policy Template (GPT) where the actual content of your GPOs resides.
Group Policy Template (GPT)
The Group Policy Template is where the meat of the GPO resides. GPT resides in a share known as SYSVOL. This share, like the portion of the GPO stored in Active Directory, is replicated to every DC in the domain. This way, when a client queries for the GPOs it needs to process, it can locate the contents of those GPOs.
To see the content of your GPOs, you’ll want to look at the SYSVOL share on one of your DCs. You can find the SYSVOL share by navigating to domainname.comsysvolsysvol (yes, there is a shared SYSVOL folder within a parent SYSVOL folder). The actual sysvol share is set to <servername>sysvol on DC’s Within this folder. You will see the same list of GPOs that appear within Active Directory’s System/Policies container. These folders are where the actual settings of your GPO are contained. Depending on the number of settings you’ve put in place, there will be present in each folder.
To know more about Group Policies please see the following blogs which cover it in detail:
Coming back to the customer scenario, the environment had multiple DC’s running 2012 R2 and Windows Server 2016. It was reported on few machines that they were unable to access the domain root share i.e domainname.com, when they were trying to access this share, it was giving them a generic error as below
Because of this none of the client machines were getting updated Group Policies from Active Directory and new machines were not getting Group Policies at all.
Initially it was reported that it is only few users who are experiencing this problem but later it was confirmed that this is affecting all of them.
Interestingly this issue was not reported on the Domain Controllers and was affecting all the client machines and member servers.
The troubleshooting started by verifying health of the DC’s, we verified below details and all came out clean:
- DFS(Sysvol) replication
- Verified Secure channel between few Domain Controllers
- Verified Secure channel between one of the Client machine and a couple of Domain Controllers
- Verified that Sysvol & Netlogon are shared on few DC’s
- Verified that port 445 is listening on few Domain Controllers
Since the environment was big and we were not sure if this issue was with few DC’s or all we wanted to take a network trace from DC and client to find out what’s happening inside the hood.
For this purpose, we took a client machine and added a Host file entry pointing to a specific DC (PDC) in our case.
We also verified the DC share was accessible and listening on port 445 when accessed by Name or IP address.
Below are some of the snips from the network trace and analysis which helped us isolate the problem.
There is a successful TCP three-way handshake between the client and DC as you can see below
Share access fails in frame 63 and 48 with Error: Status_More_Processing_Required.
Below frames are showing successful Kerberos Authentication (TGS Request & Response):
From the above trace we see that there is successful Kerberos Authentication and we observed that Encryption type was RC4 and any other authentication were happening with AES.
However, there is something fishy with Kerberos happening there which is not visible clearly.
Upon checking the “DCDIAG /q” output on a Domain Controller we observed various error events related to CIFSDomainname.com
Further we scanned the event logs of one of the client machines and we saw below event which moved us closed to the cause.
Log Name: System
Date: 11/13/2019 3:38:30 AM
Event ID: 4
Task Category: None
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server win-jm1i0viqpkg$. The target name used was cifs/zoheb.local. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Ensure that the target SPN is only registered on the account used by the server. This error can also happen if the target service account password is different than what is configured on the Kerberos Key Distribution Center for that target service. Ensure that the service on the server and the KDC are both configured to use the same password. If the server name is not fully qualified, and the target domain (ZOHEB.LOCAL) is different from the client domain (ZOHEB.LOCAL), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
Clearly there were authentication issues, but why only from client machines, servers and not from Domain Controllers?
For that we checked if there is any Duplicate SPN present in the environment (setspn-x) and found none related.
We wanted to check where is the CIFS/DomainName.com SPN registered and for that we ran the cmd “setspn -q cifs/domainname” but found none.
Then we ran the below command to check Host/DomainName SPN
setspn -q host/domainname
We saw that a client machine was added to the domain recently and it had this SPN registered against it. Upon further checking we observed that this was done by an ITAdmin surprisingly who wanted to register something else and ended up registering a Domain Name SPN.
We deleted this and that resolved the issue.
Below are few ways this could have been avoided:
- Monitoring and maintaining ACL control and Delegation in the domain
- Configuring alerts if any Domain Name based SPN are registered
Hope this helps,