1

Following this https://unix.stackexchange.com/a/279354/108702, I ran;

lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
CPU(s):              8
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1

However with top:

top - 01:06:47 up 51 days,  6:24,  2 users,  load average: 23.67, 22.50, 22.40
Tasks: 5989 total,   1 running, 5919 sleeping,   0 stopped,   0 zombie
%Cpu(s): 84.6 us,  2.7 sy,  0.0 ni, 12.3 id,  0.4 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32799488 total,   940020 free, 18284088 used, 13575380 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 14034316 avail Mem

What am I missing? I was expecting a load of 2 * 4 * 1 = 8 maximum (at 100% use)? (Also, is it too high?)

nha
  • 113

2 Answers2

6

TL;DR, 23 is probably too high.

The easiest way to think of load is "number of processes in the queue to use a CPU". If the load exactly matches the CPU count, the number of processes needing CPU exactly matches the available CPU, and you have ideal usage. If the load is higher than the number of CPUs available, then some processes had to wait for CPU to be available, and you're not achieving ideal throughput because you don't have enough resources. If load is lower than CPU count, some CPUs are sitting idle, and you can probably get more throughput from this box.

It's a useful counter to CPU usage, because it tells you how oversubscribed you are; CPU usage will tell you instantaneous consumption, but if all your cores are running at 100%, that might actually be ideal - what matters here is the load average, it will tell you how big the queue is. As an analogy, it's OK to have a worker at McDonalds serving customers 100% of the time, what matters is how many people are waiting to be served. This is what load average is telling you.

This is a simplification, of course, the there are technicalities and subtleties galore, but for 95% of us it's a good enough rule to gauge the demand on your system and interpret what load average is tell you.

1

A load average of 1 means that a single cpu or core was 100% busy for the past X minutes (depends on where that 1 is, assume it is in first position so 1 minute)

So with 4 cores a load average of 4 x x indicates that all of your cores have been 100% busy. Even assuming all of the work being done was hyperthread appropriate, 100% busy would be a LA of 8 x x ...

This may be of interest - https://www.tecmint.com/understand-linux-load-averages-and-monitor-performance/

ivanivan
  • 4,955