3

First of all, I forgot to save the process list. I had to reboot the server so I can't give much information.

I have a small VPS. The cpu usage was 100% in both top and in the control panel graphs, and it was high all day long (I noticed this morning that the webserver wasn't replying to requests). After rebooting the VPS, the cpu is back at below 4% (it's only me using the webserver, so there isn't any real load).

What could be the cause? How can I investigate this if it happens again?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
m fran
  • 133
  • was there any zombie process in your the top output. now we are helpless, if you don't have any details. But for the next time save the output and check some small thing like zombie process, load average of the CPU & wait process – AReddy Feb 04 '16 at 12:06
  • I do remember that there was 0 zombie processes. – m fran Feb 04 '16 at 13:20

1 Answers1

2

It could be CPU consumed within a kernel process, driver, or interrupts.

This is not an answer, but one way to approach solving it. Some details to this approach are Linux-specific. Install sysstat aka sar package and modify or add a sar-collection crontab (on RedHat systems, /etc/cron.d/sysstat):

* * * * * root /usr/lib64/sa/sa1 -L -F -S XALL 10 6

Be prepared to accumulates 3 GB or more during the month in /var/lib/sa. If your version support neither -L nor -F then add the following cron entry:

 57 23 * * * root rm -f /var/log/sa/sa`date --date=tomorrow +\%d`

After a day, use sar -f /var/log/sa/saXX -C where XX is yesterday's day of the month as a 0-leading integer (ie, 01, 02, ... 10, 11 ... 31). When you find a time window where the CPU was high, you can check the sar report for that time window for:

  • interrupts (-I ALL)
  • network usage (-n DEV)
  • disk I/O (-b)

Let's say you see the CPU jump between 10:15 and 10:18. Run sar on that day (05) as follows:

sar -f /var/log/sa/sa05 -s 10:14:00 -e 10:19:00 -I ALL -n DEV -b | less

We add a minute on either side so that your can observe before/during/after, not just during.

If you still don't see anything, and you've looked at other sar parameters and you still don't see anything, try adding this to cron:

* * * * * { date; /bin/ps -A --sort tty,comm,pid -ww -o pgrp:8,tty:7,pid,c,pmem:5,rss:8,sz:8,size:8=TSIZE,vsz:8,nlwp,lstart,args ;} >>/var/log/procscan

This file will get VERY big, so be sure to rotate it or disable the cronjob the next day. But from this output, you might find your culprit.

To solve some of the problems with this cronjob, I've created a wrapper script and set of supporting files and put them on github. You can find them here (link to project on github)

Otheus
  • 6,138