9
Tasks: 747 total, 176 running, 560 sleeping,   0 stopped,  11 zombie
Cpu(s): 10.5%us, 89.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  74236420k total, 73285344k used,   951076k free, 12261184k buffers
Swap:  8388600k total,    10404k used,  8378196k free, 27872176k cached

89% of CPU is being used by %sy. What is that %sy?

This is how iostats look like

root@host [~]# iostat -xk 5
Linux 2.6.32-431.20.3.el6.x86_64 (host.superhostsite.com)       09/03/2014      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          43.02    0.28   50.00    0.05    0.00    6.65

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.25    64.95   14.21   79.82    91.86   579.51    14.28     0.15    1.60   0.09   0.84
sda               0.87   182.70   28.06  206.05   247.08  1629.10    16.03     0.49    2.07   0.09   2.22

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.45    0.00   91.55    0.00    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00    14.00    0.20   15.00     3.20   116.00    15.68     0.03    1.92   0.28   0.42
sda               0.00    23.20    2.00   47.80    25.60   284.00    12.43     0.02    0.42   0.14   0.70

So disk usage is small. Everything is small. And yet, huge 89.2% cpu used by system.

Why Why %sy is high? Why not %us?

user4951
  • 10,519
  • In the edit you ask why %sy is high, not %us. It's because the processes that use the CPU spend more time in system calls than user code - pretty normal. Regarding why that is the case? It makes no sense to guess as long as I have not even seen the names of the processess. See my edit for a command to list processes and threads in running state. – Volker Siegel Sep 03 '14 at 17:28

2 Answers2

8

I assume your question is basically "What's going on here?".

I will answer by explaining your output - If that helps, let me know, I'd add more detail.
(Try to edit the question so that is's more clear what you are asking, otherwise it may get closed)

So, yes, you see "huge CPU load due to high CPU usage"!

Let's look at the top output:

Cpu(s): 10.5%us, 89.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st

The percentage values say where the time is spent - in user code, or in system (kernel) code. The %sy are the time in system code - and there are additional 10%us - user time. So the CPU is used by 100%! (You can see it from the 0%id - idle - also.)

But there is even more:

Tasks: 747 total, 176 running, 560 sleeping, 0 stopped, 11 zombie

There are 176 running processes. But if you have less than 176 cores, some of them are olny in the state that they could run if they had CPU time.
That means you have more load, that could get more CPUs to 100% usage.
Your CPU is not used to 89.2% - it's 100%

From this, there is no reason to look at iostat - the system does not need much IO in this state.

But the information we need it: what are these at least 176 processes or threads, there may be many more similar tasks not in running state.

And the next will be: what are they doing, and why?

So take a look at the process list in top - it may show some obvious problem.

It could help to know more about the processes in "runable" state;
The command below lists all processes and threads that are in "runable" state - the tasks that could run if they get CPU time:

ps -o comm,pid,ppid,user,time,etime,start,pcpu,state --sort=comm aH | grep '^COMMAND\|R$'

For me, that lists only one or two lines, including ps itself

Volker Siegel
  • 17,283
2

I can see from the top output that there is high memory utilization (including huge file system cache). As said, your kernel code is more in use than user code, this could be because of more kworkers are in work to release the memory from cache and allocate it to requesting processes (176 running process may request memory at run time).

One solution could be to reduce the cache size or drop it using:

echo 3 > /proc/sys/vm/drop_caches

This is will reduce the number of kworkers and eventually reduce the kernel code execution.

  • This greatly reduced the lag I got whenever my kernel CPU usage was high, but the kernel CPU usage itself is still high sometimes. That's weird, but at least the problem is solved now, I guess. – Fabian Röling Jan 10 '20 at 09:22
  • Or maybe not… It definitely still happens, but I have a vague feeling that the kernel CPU usage goes down faster now. – Fabian Röling Jan 10 '20 at 11:47