2

The output of top command shows that CPU load is about 82!. The system has 32 cores. Normally, each core increment the load by one.

How can I find out what is exactly happening?

top - 11:20:43 up 88 days, 17:03,  1 user,  load average: 81.82, 82.88, 83.36
Tasks: 755 total,   6 running, 748 sleeping,   0 stopped,   1 zombie
Cpu(s):100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  66102764k total, 15693220k used, 50409544k free,   101940k buffers
Swap:  1023992k total,     4460k used,  1019532k free, 10740348k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2519 John      20   0 14.8g 925m 6064 R 871.6  1.4   6624:27 l502.exe
22476 Phil      20   0 14.8g 558m 2680 R 868.6  0.9 940:02.21 l703.exe
 3007 John2     20   0 14.8g 556m 2680 R 827.4  0.9   6617:36 l703.exe
 8628 Rob       20   0 18.4g 1.5g 6536 R 634.0  2.4 649029:06 l502.exe
 2977 root      20   0 15556 1776  944 R  1.0  0.0   0:01.01 top
 2243 root      20   0     0    0    0 S  0.3  0.0 139:50.57 kondemand/0
 2245 root      20   0     0    0    0 S  0.3  0.0 140:26.64 kondemand/2

I have to say that lXXX is a multithread program. Also, please see the output of vmstat below

[root@compute-0-3 ~]# vmstat -SM 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
82  0      4  48587     99  10411    0    0     0     0    0    0 92  1  7  0  0
82  0      4  48588     99  10411    0    0     0     0 32053 10692 100  0  0  0  0
82  0      4  48588     99  10411    0    0     0     0 32162 10921 100  0  0  0  0
82  0      4  48586     99  10411    0    0     0     0 38494 9839 99  1  0  0  0
82  0      4  48587     99  10411    0    0     0    12 40880 9796 99  1  0  0  0
82  0      4  48587     99  10411    0    0     0     0 38544 9752 99  1  0  0  0
82  0      4  48587     99  10411    0    0     0     0 36764 9782 99  1  0  0  0
^C

Also the output of iostat 1:

    [root@compute-0-3 ~]# iostat 1
    Linux 2.6.32-279.14.1.el6.x86_64 (compute-0-3.local)    11/20/2016      _x86_64_        (32 CPU)

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              92.38    0.08    0.62    0.18    0.00    6.74

    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sda               0.36         1.00         4.80    7696082   36787520

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              98.91    0.00    0.03    0.00    0.00    1.06

    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sda               0.00         0.00         0.00          0          0

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              99.06    0.00    0.03    0.00    0.00    0.91

    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sda               0.00         0.00         0.00          0          0

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              99.06    0.00    0.00    0.00    0.00    0.94

    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sda               0.00         0.00         0.00          0          0

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              99.34    0.00    0.03    0.00    0.00    0.62

    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sda               0.00         0.00         0.00          0          0

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              99.47    0.00    0.06    0.00    0.00    0.47

    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sda               0.00         0.00         0.00          0          0

    ^C

Also, the output of mpstat -P ALL:

    [root@compute-0-3 ~]# mpstat -P ALL
    Linux 2.6.32-279.14.1.el6.x86_64 (compute-0-3.local)    11/20/2016      _x86_64_        (32 CPU)

    02:29:17 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
    02:29:17 PM  all   92.39    0.08    0.58    0.18    0.00    0.04    0.00    0.00    6.74
    02:29:17 PM    0   91.91    0.10    0.58    0.10    0.00    0.00    0.00    0.00    7.31
    02:29:17 PM    1   94.32    0.05    0.51    0.03    0.00    0.00    0.00    0.00    5.09
    02:29:17 PM    2   91.00    0.10    0.59    0.12    0.00    0.00    0.00    0.00    8.19
    02:29:17 PM    3   93.70    0.08    0.56    0.05    0.00    0.02    0.00    0.00    5.59
    02:29:17 PM    4   90.13    0.08    0.58    0.11    0.00    0.00    0.00    0.00    9.09
    02:29:17 PM    5   93.45    0.07    0.55    0.05    0.00    0.01    0.00    0.00    5.86
    02:29:17 PM    6   89.81    0.09    0.58    0.09    0.00    0.00    0.00    0.00    9.42
    02:29:17 PM    7   93.59    0.07    0.54    0.06    0.00    0.05    0.00    0.00    5.68
    02:29:17 PM    8   91.44    0.09    0.62    0.13    0.00    0.00    0.00    0.00    7.72
    02:29:17 PM    9   93.68    0.08    0.60    0.11    0.00    0.13    0.00    0.00    5.39
    02:29:17 PM   10   91.07    0.11    0.60    0.18    0.00    0.00    0.00    0.00    8.04
    02:29:17 PM   11   94.20    0.10    0.54    0.05    0.00    0.00    0.00    0.00    5.11
    02:29:17 PM   12   91.04    0.09    0.59    0.17    0.00    0.00    0.00    0.00    8.12
    02:29:17 PM   13   94.27    0.08    0.53    0.05    0.00    0.00    0.00    0.00    5.06
    02:29:17 PM   14   90.17    0.08    0.60    0.15    0.00    0.00    0.00    0.00    9.01
    02:29:17 PM   15   94.11    0.10    0.56    0.05    0.00    0.00    0.00    0.00    5.17
    02:29:17 PM   16   91.49    0.09    0.59    0.19    0.00    0.00    0.00    0.00    7.65
    02:29:17 PM   17   93.94    0.10    0.56    0.05    0.00    0.04    0.00    0.00    5.31
    02:29:17 PM   18   91.40    0.08    0.56    0.16    0.00    0.00    0.00    0.00    7.81
    02:29:17 PM   19   94.11    0.09    0.55    0.09    0.00    0.10    0.00    0.00    5.06
    02:29:17 PM   20   90.73    0.05    0.55    0.18    0.00    0.00    0.00    0.00    8.48
    02:29:17 PM   21   94.28    0.12    0.57    0.06    0.00    0.05    0.00    0.00    4.91
    02:29:17 PM   22   90.53    0.08    0.56    0.17    0.00    0.00    0.00    0.00    8.66
    02:29:17 PM   23   94.09    0.10    0.54    0.04    0.00    0.08    0.00    0.00    5.15
    02:29:17 PM   24   90.88    0.10    0.69    0.60    0.00    0.00    0.00    0.00    7.73
    02:29:17 PM   25   94.12    0.06    0.53    0.15    0.00    0.02    0.00    0.00    5.12
    02:29:17 PM   26   90.64    0.08    0.63    0.59    0.00    0.00    0.00    0.00    8.05
    02:29:17 PM   27   94.52    0.05    0.51    0.13    0.00    0.01    0.00    0.00    4.78
    02:29:17 PM   28   90.48    0.05    0.65    0.52    0.00    0.00    0.00    0.00    8.29
    02:29:17 PM   29   93.43    0.07    0.61    0.42    0.00    0.47    0.00    0.00    5.02
    02:29:17 PM   30   89.84    0.08    0.67    0.63    0.00    0.00    0.00    0.00    8.77
    02:29:17 PM   31   93.94    0.04    0.62    0.21    0.00    0.31    0.00    0.00    4.88
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
mahmood
  • 1,211

1 Answers1

2

About the system load, as it can be seen from vmstatit is mostly CPU time from the application processes.

From the column cpu - us - Time spent running non-kernel code. (user time, including nice time), as you see is 92%, 100%,100%,99% (of 100%). iostat also confirms the system use is (almost) entirely CPU-bound, as the amount of I/O is negligible.

What can be said for sure is the user processes are running very intensive CPU operations. Actually this being a computing rig, this can be the baseline of the system, and the intended and acceptable state of the system.

We can say for certain is that your services will be required when the CPU is not being used, not the other way around.

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
  • But if you run a program with 100% cpu utilization the load should be 1. For 32 cores, the load should be 32 then. I also double that for tolerance. Then the load should be 64... But 80 is very high – mahmood Nov 20 '16 at 09:42
  • look at vmstat AND mpstat, the output is more saner for pure CPU usage. Load is not exactly only CPU usage, it is built of several inputs. – Rui F Ribeiro Nov 20 '16 at 09:43
  • What are the description of system and cpu columns? I mean in cs us sy id wa st – mahmood Nov 20 '16 at 09:46
  • Please have a look http://unix.stackexchange.com/questions/18918/in-linux-top-command-what-are-us-sy-ni-id-wa-hi-si-and-st-for-cpu-usage – Rui F Ribeiro Nov 20 '16 at 09:54