How is the load average is calculated in this case?

Question

load average: 20.62, 18.83, 11.31

This is the load average that am getting on a quad core processor. The program I am running isn't taking only the CPU, but also other resources.

How can I show such numbers with CPU only?

Is this only a single processor system? Are the loads increasing over time? Also, a complete profile of the jobs currently running would be useful. i.e. the output of top and iostat. — Raman Sailopal, Sep 25 '17 at 08:35

score 8 · Answer 1 · answered Sep 25 '17 at 09:18

Your question: How is the load average is calculated in this case?

Answer: this three digits are exponentially-damped moving averages of different time series (1, 5 anf 15 min). This calculation based on processes placed in processors's queues but not on real CPU utilization which all are accustomed to see in percentage terms (like in Windows)

But i think this is not the answer that you want to see in order to understand what is happening in your system at the current time.

Load average reflect "system" load, that consider CPU load and I/O waiting. And this is most common parameter from which you need to start you performance issues troubleshooting. Use different metrics (disk load) and tools (e.g. iostat from sysstat package) to analyze system performance.

And the Second answer to you question:

to calculate CPU utilization use

grep 'cpu ' /proc/stat | awk '{usage=100-($5*100)/($2+$3+$4+$5+$6+$7+$8)} END {print usage}'

See topic and read article.

And i highly recommend to use monitoring systems (e.g. Zabbix) to

peterh · Answer 2 · 2020-05-23T00:30:41.780

These 3 numbers are not the numbers for the different CPUs.

These numbers are mean values^{(see note)} of the last 1, 5 and 15 minutes.

The load means the following: if there are multiple processes on a single-cpu system, they are running seemingly parallel. But it is not true. What practically happens: the kernel gives 1/100th second to a process, and then breaks its run with an interrupt. Then gives the next 1/100th second to another process. Sometimes this 1/100 can be 1/1000 or even an ever changing time limit, it doesn't matter.

Practically the question, "which process should get our next 1/100th seconds interval?", will be decided by a complex heuristics. It is named as task scheduling.

Of course, processes which are blocked, for example they are waiting their data what they are reading from the disk, are exempt from this task scheduling.

What load says: how many processes are currently waiting their next 1/100th second time frame. Of course, it is a mean value. This is because you can see multiple numbers in a cat /proc/loadavg.

The situation in a multi-cpu system is more complex. There are multiple cpus, whose time frames can be given to multiple processes. That makes the task scheduling a little bit - but not too much - complexer. But the situation is the same.

The kernel is intelligent, it tries to share the system resources for the optimal efficiency, and it is in the near of that (there are minor optimization things, for example it is better if a process will be runned the longest possible time on the same cpu because of caching considerations, but they doesn't matter there). This is because if we have load 8, that means: there are actually 8 processes waiting for their next time slice. If we have 8 cpus, we can give these time slices to the cpus one-to-one, and thus our system will be used optimally.

If you see a top, you can see that the number of the actual running processes is surprisingly low: they are the processes marked by R there. Even on a not really hardcore system is it often below 5. This is partially because the processes waiting their data from the disks or from the network are also suspended (marked with S in top). The load shows only the cpu usage.

Multicore CPUs are practically multiple CPUs on the same silicon chip. There is no difference from this view.

In case of hyperthreaded CPUs there is an interesting side effect: loading a cpu makes its hyperthreaded pairs slower. But this happens on a deeper layer what the normal task scheduling handles, although it can (and should) influence the process-moving decisions of the scheduler. Hyperthreaded CPUs aren't widely used today.

On Windows, a different method is used for the load calculation: there load 1.0 means that all of the cpu cores are used up to 100% (what would be load 4.0 on your system).

^note_{As @Alex mentions, it is not a time-average. It is an exponentially weighted time average with a time constant of 1, 5, 15 minutes. It is more effective to calculate, and reacts better for recent changes. For more details, see the source in kernel/sched/loadavg.c.}

You explanation is missing the fact Linux includes processes in interruptible wait in its load calculation. Unlike processes in the Run queue, these processes do not participate in the CPU load (the CPU is idle processing them) so the load values observed on Linux might be confusing. — jlliagre, Sep 25 '17 at 09:18
@jlliagre What do you understand on interruptible wait, S (=sleep)? This I mention. This answer doesn't mention the terminology. — peterh, Sep 25 '17 at 09:21
I do not see "sleep" being mentioned in answer. Uninterruptible wait is state D, not S. You wrote: "This is because if we have load 8, that means: there are actually 8 processes waiting for their next time slice." This is incorrect. — jlliagre, Sep 25 '17 at 10:45
@jlliagre I ask: "What do you understand on interruptible wait, S (=sleep)?" Your answer: "Uninterruptible wait is state D, not S". | I would be happy to improve the post, but please be more clear. — peterh, Sep 25 '17 at 11:43
Sorry if I wasn't clear. My answer was: no, I'm not understanding uninterruptible wait to be state S. Uninterruptible wait state is reported by psand other utilities under the STATor Scolumn with the letter D, not the letter S which is used for sleep. Moreover, the load is not limited to counting the processes waiting for the next available slot. Running processes take also part of the metric computation. You can have a non zero load with an empty run queue. — jlliagre, Sep 25 '17 at 12:11
i quote " numbers are mean values of the last 1, 5 and 15 minutes". it is a big lie. there is no any relationship between load average and real average. so called load average it is exponential dumping whatever it is that doesnt make any real physical sense. let me give some clear example. if you launch program that uses 1 thread of cpu for 100% for 1 minute load average will not show 1. NO no no. it will show 0.62. etc etc. if you dig deep you ll see that LA is full of lies. — Alex, May 22 '20 at 15:27
@Alex I've just checked kernel/sched/loadavg.c and I think you are right. Also I am shocked on various things, how crap is the available information / docs everywhere around the net (particularly in KSM and zswap I've found such a trashpile what I could never imagine). Here you are right, I am thinking about to fix the post. I knew it badly. — peterh, May 23 '20 at 00:15
@Alex Btw, that exponential thing is imho not so bad, actually it might be better than a simple average - it reacts better for recent changes, and it is far more effective to calculate. Average would require to store an array of the last N measurement points, while with exp we need to store only their exponentially weighted sum, which can be calculated recursively in every iteration. — peterh, May 23 '20 at 00:17

Ronald Cash · Answer 3 · 2021-11-24T07:55:14.580

I don't know why people not saying a straightforward answer to this question. Everybody talking in a computer engineering language.

Actually, the digits are the load average out of "1". If your load average is "5" that means CPU uses is like 500%, that means you got 400% overload. (500% - your capacity 100%). If it says "0.05" that means you're using only 5% CPU, 95% remain unused.

The above calculation is for 1 cores. When you have multiple cores, the average will be divided by the core number/ core count. For example, if you have 4 cores and you got a average "10" that means your CPU usage is: 10 / 4 * 100 = 250%

So, the ideal value of the averages is equal or less than your "core number". And the 3 values (20.62, 18.83, 11.31) are the averages of the last 1 minute, last 5 minutes, and last 15 minutes.

Actually Load Average is not calculated based on CPU usage only in Linux.That's why eveyone is talking in computer engineering language. — vbt, Oct 14 '22 at 16:05

user47091 · Answer 4 · 2022-10-15T01:45:10.130

Without getting into the specifics of what "load" is, a way to understand this in simple terms is that 1 is the amount of load a CPU core can actually handle at a given time.

If load averages are above 1, that means there is more work than can be done by a single CPU core and if only one CPU core is available, then a computer that has a load average above 1 for extended periods of time would have to "skip" allocating resources to some tasks during some time-slices, slowing their execution.

Thankfully, modern computers usually have more than one CPU core available, meaning a load average of 1 - which would be bad on a single core system; representing a pegged CPU in danger of overheating or premature failure - is perfectly fine on a four core system, spread across all four cores.

Load averages are comparatively useless on desktop machines which tend to have wildly varying workloads, on servers which have stable workloads, load averages can indicate DoS attacks, hacking attempts, broken hardware, hardware upgrade urgency, misbehaving software, etc etc.

Something like top or vmstat is more appropriate, there are also interactive tools in the package repositories of most Linux distributions, such as htop and glances which can be helpful in isolating runaway resource usage in real time.

How is the load average is calculated in this case?

4 Answers4