What to look for in log files if I think limited memory or disk space is causing a crash

Question

Troubleshooting /var/log files for a recent series of crashes, what should I look for in the files if I believe low memory or disk space are to blame? Is there a general term used in the Linux error-throwing lingo for hardware faults of this kind? And, what system processes would be effected, such as the kernel, by a critical shortage of memory?

Just as background, I was working on a Drupal site hosted on my Fedora 17 sandbox project laptop when I experienced these system crashes. Recently I've downloaded some rather large files (I've since moved to media) and was down to about 1.8G of HD space.

I found some useful posts here about monitoring memory usage with top or current disk usage with du. This question, however, is specifically about log files. I found a similar post at Fedora Forums searching for an explanation of FPrintObject which lead me to do Memtest, but nothing is reported bad there.

jordanm · Accepted Answer · 2012-08-28T15:58:41.920

The information you are looking for is not found in usual syslog logs. For viewing performance history from the command line, sysstat is an excellent tool.

With sysstat, the sadc collects system information and writes them to a log file. The log file is a binary format, but can be viewed with the sar command.

Here is an example of sar output with no options:

$ sar
09:15:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
10:05:01 AM     all     77.49      0.37     22.13      0.00      0.00      0.00
10:15:01 AM     all     77.30      0.40     22.29      0.00      0.00      0.00
10:25:01 AM     all     77.19      0.38     22.42      0.00      0.00      0.00
10:35:01 AM     all     39.31      0.35     23.80      0.01      0.00     36.53
10:45:01 AM     all     32.22      0.34     24.26      0.03      0.00     43.15
10:55:01 AM     all     32.80      0.33     23.78      0.01      0.00     43.08
11:05:01 AM     all     32.70      0.33     23.76      0.00      0.00     43.20
Average:        all     63.90      0.39     22.79      0.00      0.00     12.91

The information you see is the same information provided by top, but is historical data. You can also see detailed information about RAM, network, and disk utilization. Here is an example for RAM usage:

$ sar -r
09:15:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
02:15:01 PM    457076   1357116     74.81    277876    810948    205520      5.40
02:25:01 PM    456836   1357356     74.82    277876    811168    205384      5.40
02:35:01 PM    456976   1357216     74.81    277876    811256    204728      5.38
02:45:01 PM    457036   1357156     74.81    277876    811368    204840      5.38
02:55:01 PM    456588   1357604     74.83    277896    811492    204924      5.38
Average:       332452   1481740     81.67    277720    793953    416953     10.96

Outside of running sar locally, there are many monitoring systems that show performance trending data. Munin, cacti, and zabbix are some examples. These have the benefit of graphing and keeping the data for multiple servers in a centralized location.

Update to answer from comments:

The sar command will tell you if you ran out of RAM prior to the crash. This will be obvious as kbbuffers and kbcached will drop dramatically. You can also check dmesg for OOM (out of memory) killer, but dmesg is only written to logs if klogd is installed. You won't see any logs about out of disk space, unless an application specifically reports its failure to write to disk. However, if the disk is full, syslog won't be able to write the log to disk either.

while certainly useful tool, I'm not sure if you've answered the questions about classes or specific terms in a log file related to faults of this kind. This is for a real, and currently continuing problem and even with this info, I'm not clear what I'm looking for. — xtian, Aug 25 '12 at 11:46
@xtian - sar will tell you if you ran out of RAM prior to the crash. This will be obvious as kbbuffers and kbcached will drop dramatically. You can also check dmesg for OOM (out of memory) killer, but dmesg is only written to logs if klogd is installed. You won't see any logs about out of disk space, unless an application specifically reports its failure to write to disk. However, if the disk is full, syslog won't be able to write the log to disk either. — jordanm, Aug 25 '12 at 21:06
+1 for kbbuffers, kbcached, OOM, klogd, syslog (^^) The system was _not completely out of memory, there was something like 700MB left over. abrtd wrote a directory/core dump each time I tried to relaunch Firefox resulting in 1.1G of core dump files. Therefore, the memory loss was not all at once, but incrementally as I stubbornly continued to relaunch Firefox (>_<). If you add the comment to the answer I will certainly accept it. — xtian, Aug 28 '12 at 15:55
@xtian - as another note, you can configure /etc/security/limits.conf to disallow core dumps from being written. — jordanm, Aug 28 '12 at 15:57
Looking into klogd I find Fedora has a package sysklogd that bundles syslogd and klogd daemons. Do you know off hand what kind of overhead this creates? (I recently slogged through dependency hell with the horrid tracker package from adding dvd video support) — xtian, Aug 28 '12 at 16:27

What to look for in log files if I think limited memory or disk space is causing a crash

1 Answers1