4

I would appreciate some advice from your experience. My main concern is I really REALLY do not want to cause the computer server to crash.

The question is, I am running a program on a Linux computer server (super-computer? Maybe.). The program I am running is able to specify the of thread which can be used. I specified I would like to use 15 thread.

The computer server I am using has about 20+ processor (Intel Xeon CPU with 6 core). From top c, I saw that from running my program I am using

%CPU
190.7%

So I proceed to check with top c (1) and below is the output

Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 95.7%us,  0.3%sy,  0.0%ni,  0.0%id,  3.6%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 : 96.0%us,  0.7%sy,  0.0%ni,  0.0%id,  3.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu16 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu17 :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu18 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu19 :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu20 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

The % of CPU used shifts between different CPU, for example sometimes cpu20 gets 90% and cpu1 returns to 0%.

Is there a chance the computer server might crash because I am using 190% of CPU?

Braiam
  • 35,991
helping
  • 41
  • Rephrasing it....I am using 10% of total CPU will this crash my system? -- don't think so. – mdpc Feb 06 '13 at 17:42
  • The existing answers don't address this, so: NO, a high load isn't going to crash your computer. A high load merely means that you're using the CPU to the maximum of its capacity. That's what it's designed for. Anything less saves power, and might save wear and tear on the cooling system, but underuses the CPU itself. And similarly in software: if you aren't using the full load, the system has to go to sleep between work. – Gilles 'SO- stop being evil' Feb 06 '13 at 21:28

2 Answers2

3

Percentage of cpu is reported differently, by different tools, by different systems. A better way to consider cpu load is with load. Consider the below overloaded worker machine:

# w 
 02:22:31 up 221 days, 11:06,  1 user,  load average: 9.87, 9.50, 7.25
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
stephan  pts/0    173.13.169.18    02:22    0.00s  0.44s  0.00s w


~$ cat /proc/cpuinfo |grep processor
processor   : 0
processor   : 1

This says I have a 1 min load of 9.87, 5 min load of 9.50, and 15 min load of 7.25. The 'load' number represents how many processors worth of work this machine has been assigned, while the cpuinfo command shows me how many actual processors I have to do the work. If I had 12 cpus, this load level would be perfectly fine.

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                
11579 app       20   0  263m  97m 4104 R   22  1.3   0:00.85 ruby                                                                                   
11586 app       20   0     0    0    0 Z   20  0.0   0:00.62 ruby <defunct>                                                                         
11589 app       20   0  262m  96m 3884 S   18  1.3   0:00.53 ruby                                                                                   
11592 app       20   0  260m  95m 3000 R   17  1.3   0:00.50 ruby                                                                                   
11600 app       20   0  260m  95m 2744 R   15  1.3   0:00.45 ruby                                                                                   
11595 app       20   0  260m  95m 2744 R   13  1.3   0:00.39 ruby                                                                                   
11598 app       20   0  262m  95m 3096 R   12  1.3   0:00.35 ruby                                                                                   
11604 app       20   0  258m  93m 2744 R   10  1.3   0:00.30 ruby                                                                                   
11607 app       20   0  257m  92m 2496 R    8  1.2   0:00.25 ruby                                                                                   
11610 app       20   0  256m  91m 2560 S    4  1.2   0:00.11 ruby

So you can see cpus are being divvied out between processes, but what I care is that there's more work than the cpus can feasibly keep up with. That's causing queued jobs to have to wait for cpu to be free to use them.

phemmer
  • 71,831
Stephan
  • 2,911
  • 1st time hearing about cpu load.. thanks stephan! – helping Feb 06 '13 at 02:38
  • Sure thing, glad I could help! – Stephan Feb 06 '13 at 02:44
  • 2
    Actually this is a very common misconception of load. While load is commonly related to CPU usage, it is not directly tied to it. Things like disk or network IO can spike load while the CPU is completely idle. A more accurate description of load is "the number of processes waiting on system resources". – phemmer Feb 04 '14 at 05:14
2

%CPU is measured with respect to one CPU, so 200% means 2 CPUs working full time. All depends on how many CPUs (cores, threads) you have. If you go much above 70% or so of what is available, you are in trouble. But CPU isn't the only measure, I/O is very important too. Install sysstat (sar), configure that and make sense of its output if you are worried.

vonbrand
  • 18,253
  • Thanks for the helpful info!! What does 'I/O' stands for? and the sar gives a gret details!! thanks! – helping Feb 06 '13 at 02:34
  • @helping I/O is short for input/output, mostly disk. Another point (but much less probable as a performance bottleneck) is network I/O. – vonbrand Feb 06 '13 at 02:40
  • agreed; the programs 'top' and 'iotop' can indicate what sort of IO wait might be taking place. – Stephan Feb 06 '13 at 02:45