It could be CPU consumed within a kernel process, driver, or interrupts.
This is not an answer, but one way to approach solving it. Some details to this approach are Linux-specific. Install sysstat
aka sar
package and modify or add a sar-collection crontab (on RedHat systems, /etc/cron.d/sysstat
):
* * * * * root /usr/lib64/sa/sa1 -L -F -S XALL 10 6
Be prepared to accumulates 3 GB or more during the month in /var/lib/sa
. If your version support neither -L
nor -F
then add the following cron entry:
57 23 * * * root rm -f /var/log/sa/sa`date --date=tomorrow +\%d`
After a day, use sar -f /var/log/sa/saXX -C
where XX is yesterday's day of the month as a 0-leading integer (ie, 01, 02, ... 10, 11 ... 31). When you find a time window where the CPU was high, you can check the sar report for that time window for:
- interrupts (
-I ALL
)
- network usage (
-n DEV
)
- disk I/O (
-b
)
Let's say you see the CPU jump between 10:15 and 10:18. Run sar on that day (05) as follows:
sar -f /var/log/sa/sa05 -s 10:14:00 -e 10:19:00 -I ALL -n DEV -b | less
We add a minute on either side so that your can observe before/during/after, not just during.
If you still don't see anything, and you've looked at other sar parameters and you still don't see anything, try adding this to cron:
* * * * * { date; /bin/ps -A --sort tty,comm,pid -ww -o pgrp:8,tty:7,pid,c,pmem:5,rss:8,sz:8,size:8=TSIZE,vsz:8,nlwp,lstart,args ;} >>/var/log/procscan
This file will get VERY big, so be sure to rotate it or disable the cronjob the next day. But from this output, you might find your culprit.
To solve some of the problems with this cronjob, I've created a wrapper script and set of supporting files and put them on github. You can find them here (link to project on github)