Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

First published on TECHNET on Mar 26, 2019

NOTE: This blog is going through a specific issue in order to help show the steps involved in this type of issue. The process name(s) referenced in this content, except for System, can be any process, and not just the process used as an example in this post, and in no way are there expectations, no is it implied, that this particular process will cause you any problems!

Hey everyone, Konstantin Chernyi here. I'm a Premier Field Engineer at Microsoft Russia and today I'm gonna tell you a real-world story that happened recently. Long story short, a customer asked me: “How do I understand why the System process is consuming 100% of a single CPU core on my machine?”

Whenever I see description or request like this, my first step is to collect ETW trace. In the past, I would send a long instruction install Windows Performance Toolkit
https://docs.microsoft.com/en-us/windows-hardware/test/wpt/ and use xperf with appropriate kernel flag to collect data, but these days, thanks to PG, I don't need to do that anymore. Since very first release of Windows 10/Window – WPR (Windows Performance Recorder) with a lot of predefined profiles shipped with the OS. So, all we need is – collect short trace at the very exact moment when problem exist. In this case we used CPU profile:

Wpr -start CPU

Wpr -stop C:\temp\trace.etl

As soon as customer provided the trace, I opened it in WPA (Windows Performance Analyzer).

The CPU is busy indeed:

thumbnail image 1 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

Top CPU consumer – System:

thumbnail image 2 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

The System process has multiple threads, but only one TID (#76) is very active and consuming CPU time:

thumbnail image 3 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

With public symbols https://docs.microsoft.com/en-us/windows/desktop/dxtecharts/debugging-with-symbols we can go deeper and review function called in this thread:

thumbnail image 4 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

ntoskrnl.exe!KeBalanceSetManager huh, time to remember what I've read in Windows Internals back in a days. On page 188 of the second part of Windows Internals 6 th version, you can find explanation of this function:

The balance set manager KeBalanceSetManager , priority 16). It calls an inner routine, the working set manager (MmWorkingSetManager) , once per second as well as when free memory falls below a certain threshold. The working set manager drives the overall memory management policies, such as working set trimming, aging, and modified page writing.

Hmm, it looks like they are facing memory related problems, but they didn't mention it, and the initial request was about high CPU consumption by System process. Let's look at the memory info that we have in the trace, which isn't much since we used the CPU profile, but at least let's give it a try.

thumbnail image 5 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

According to info in the trace, this box has 4.9GB in Zero and Free lists, and 2.7GB in Standby Lists, which gives us 7.6GB available memory…confusing, isn't it? It looks like this box has plenty of available memory, but system calls KeBalanceSetManager routine every second:

thumbnail image 6 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

Also, if you take a closer look at memory utilization, you can see that the Page Pool Commit is around 11.9GB, which is a lot:

thumbnail image 7 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

Here is are some good articles to read on this area:

https://blogs.technet.microsoft.com/markrussinovich/2009/03/10/pushing-the-limits-of-windows-paged-…

https://docs.microsoft.com/en-us/windows/desktop/memory/memory-pools

Total installed RAM on this box – 32GB:

thumbnail image 8 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

It looks like we need a better look at what's happening in memory…the best way to do that – memory dump. In this case we decide to try generating mirror memory dump via livekd.exe
https://docs.microsoft.com/en-us/sysinternals/downloads/livekd

So with that in mind, I asked the customer to grab a dump by executing livekd -ml -o C:\temp\m.dmp

Next I'll use WinDbgX aka WindDbg preview https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugging-using-windbg-preview and the Mex extension https://www.microsoft.com/en-us/download/details.aspx?id=53304

Let's start by reviewing virtual memory states:

thumbnail image 9 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

From here we can see that the PagedPool Commit = 11.95GB, but the PagedPool usage is zero, I assume it's because we used mirror dump. Anyway, from this data we clearly see that we definitely have a memory issue since Available pages = 2.87MB and there are a lot pool allocation failures. First let's see first memory usage by process:

thumbnail image 10 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

Heh, SynTPEnh.exe consumed 9.48GB of RAM, very well. Now let's see what's in Paged Pool:

thumbnail image 11 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

Hmm, bunch of Token objects, let's shed some light and dump them all:

thumbnail image 12 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

Again SynTPEnh.exe, let's calculate all token handles:

thumbnail image 13 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

Keeping in mind that every handle gives us about 8 rows, so we need to divide it and get in total about 160k handles, which is a lot, and almost all of them belong to SynTPEnh.exe.

On the customer machine, the application event log was full of events like this one:

thumbnail image 14 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

So the next step is to check if there are any “zombie processes” https://randomascii.wordpress.com/2018/02/11/zombie-processes-are-eating-your-memory/ , and we can see a lot of them:

thumbnail image 15 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

The majority of the zombie processes are in session 1, which is not in used in this case since the customer is using to connect to this machine:

thumbnail image 16 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

The customer said that on this tower, there weren't any pointing devices except a mouse, so it was safe to uninstall it and check. After uninstallation memory consumption immediately went down, and there was nothing to do for the system process, so it goes almost Idle.

So what was our root cause?

For some reason the process SynTPEnh.exe was being created every 4 seconds, do some work for about 1 sec, and then crash. The token handle from parent services that started this process is not released, which leads us to memory leak and high CPU consumption. Here is an example from the trace that process SynTPEnh.exe come and go all the time:

thumbnail image 17 of blog post titled 

							Let me Count the Ways: Determining Why the System Process Consumes 100% of a Single CPU Core

Wbr, Konstantin.

 

This article was originally published by Microsoft's ITOps Talk Blog. You can find the original article here.