load average: 20.62, 18.83, 11.31
This is the load average that am getting on a quad core processor. The program I am running isn't taking only the CPU, but also other resources.
How can I show such numbers with CPU only?
load average: 20.62, 18.83, 11.31
This is the load average that am getting on a quad core processor. The program I am running isn't taking only the CPU, but also other resources.
How can I show such numbers with CPU only?
Your question: How is the load average is calculated in this case?
Answer: this three digits are exponentially-damped moving averages of different time series (1, 5 anf 15 min). This calculation based on processes placed in processors's queues but not on real CPU utilization which all are accustomed to see in percentage terms (like in Windows)
But i think this is not the answer that you want to see in order to understand what is happening in your system at the current time.
Load average reflect "system" load, that consider CPU load and I/O waiting. And this is most common parameter from which you need to start you performance issues troubleshooting. Use different metrics (disk load) and tools (e.g. iostat from sysstat package) to analyze system performance.
And the Second answer to you question:
to calculate CPU utilization use
grep 'cpu ' /proc/stat | awk '{usage=100-($5*100)/($2+$3+$4+$5+$6+$7+$8)} END {print usage}'
And i highly recommend to use monitoring systems (e.g. Zabbix) to
These 3 numbers are not the numbers for the different CPUs.
These numbers are mean values(see note) of the last 1, 5 and 15 minutes.
The load means the following: if there are multiple processes on a single-cpu system, they are running seemingly parallel. But it is not true. What practically happens: the kernel gives 1/100th second to a process, and then breaks its run with an interrupt. Then gives the next 1/100th second to another process. Sometimes this 1/100 can be 1/1000 or even an ever changing time limit, it doesn't matter.
Practically the question, "which process should get our next 1/100th seconds interval?", will be decided by a complex heuristics. It is named as task scheduling.
Of course, processes which are blocked, for example they are waiting their data what they are reading from the disk, are exempt from this task scheduling.
What load says: how many processes are currently waiting their next 1/100th second time frame. Of course, it is a mean value. This is because you can see multiple numbers in a cat /proc/loadavg.
The situation in a multi-cpu system is more complex. There are multiple cpus, whose time frames can be given to multiple processes. That makes the task scheduling a little bit - but not too much - complexer. But the situation is the same.
The kernel is intelligent, it tries to share the system resources for the optimal efficiency, and it is in the near of that (there are minor optimization things, for example it is better if a process will be runned the longest possible time on the same cpu because of caching considerations, but they doesn't matter there). This is because if we have load 8, that means: there are actually 8 processes waiting for their next time slice. If we have 8 cpus, we can give these time slices to the cpus one-to-one, and thus our system will be used optimally.
If you see a top
, you can see that the number of the actual running processes is surprisingly low: they are the processes marked by R
there. Even on a not really hardcore system is it often below 5. This is partially because the processes waiting their data from the disks or from the network are also suspended (marked with S
in top). The load shows only the cpu usage.
Multicore CPUs are practically multiple CPUs on the same silicon chip. There is no difference from this view.
In case of hyperthreaded CPUs there is an interesting side effect: loading a cpu makes its hyperthreaded pairs slower. But this happens on a deeper layer what the normal task scheduling handles, although it can (and should) influence the process-moving decisions of the scheduler. Hyperthreaded CPUs aren't widely used today.
On Windows, a different method is used for the load calculation: there load 1.0 means that all of the cpu cores are used up to 100% (what would be load 4.0 on your system).
noteAs @Alex mentions, it is not a time-average. It is an exponentially weighted time average with a time constant of 1, 5, 15 minutes. It is more effective to calculate, and reacts better for recent changes. For more details, see the source in kernel/sched/loadavg.c
.
S
(=sleep)? This I mention. This answer doesn't mention the terminology.
– peterh
Sep 25 '17 at 09:21
D
, not S
. You wrote: "This is because if we have load 8, that means: there are actually 8 processes waiting for their next time slice." This is incorrect.
– jlliagre
Sep 25 '17 at 10:45
S
. Uninterruptible wait state is reported by ps
and other utilities under the STAT
or S
column with the letter D
, not the letter S
which is used for sleep. Moreover, the load is not limited to counting the processes waiting for the next available slot. Running processes take also part of the metric computation. You can have a non zero load with an empty run queue.
– jlliagre
Sep 25 '17 at 12:11
I don't know why people not saying a straightforward answer to this question. Everybody talking in a computer engineering language.
Actually, the digits are the load average out of "1". If your load average is "5" that means CPU uses is like 500%, that means you got 400% overload. (500% - your capacity 100%). If it says "0.05" that means you're using only 5% CPU, 95% remain unused.
The above calculation is for 1 cores. When you have multiple cores, the average will be divided by the core number/ core count. For example, if you have 4 cores and you got a average "10" that means your CPU usage is: 10 / 4 * 100 = 250%
So, the ideal value of the averages is equal or less than your "core number". And the 3 values (20.62, 18.83, 11.31) are the averages of the last 1 minute, last 5 minutes, and last 15 minutes.
Without getting into the specifics of what "load" is, a way to understand this in simple terms is that 1 is the amount of load a CPU core can actually handle at a given time.
If load averages are above 1, that means there is more work than can be done by a single CPU core and if only one CPU core is available, then a computer that has a load average above 1 for extended periods of time would have to "skip" allocating resources to some tasks during some time-slices, slowing their execution.
Thankfully, modern computers usually have more than one CPU core available, meaning a load average of 1 - which would be bad on a single core system; representing a pegged CPU in danger of overheating or premature failure - is perfectly fine on a four core system, spread across all four cores.
Load averages are comparatively useless on desktop machines which tend to have wildly varying workloads, on servers which have stable workloads, load averages can indicate DoS attacks, hacking attempts, broken hardware, hardware upgrade urgency, misbehaving software, etc etc.
Something like top or vmstat is more appropriate, there are also interactive tools in the package repositories of most Linux distributions, such as htop and glances which can be helpful in isolating runaway resource usage in real time.