4

Sometimes, I have a rogue Java process which takes up 100% of my CPU and makes it jump about 30C in temperature (usually resulting in a crash if not killed).

Problem is, I can never really identify it (its got a long list of parameters and stuff) or analyze it because I have to kill it so quickly.

Is there a sort of log I can look at to see the identity of past processes I have killed? If not, is there a way for me to catch that process next time it shows up?

If it matters I'm OpenSuse 11.4.

nopcorn
  • 9,559
  • Instead of obliterating the process would it be possible to use SIGSTOP to pause it instead? – jw013 Nov 23 '11 at 21:51

2 Answers2

7

No, not by default. There is such a thing as too much logging (especially when you start risking logging the action of writing a log entry…).

BSD process accounting (if you have it, run lastcomm), if active, records the name of every command that is executed and some basic statistics, but not the arguments.

The audit subsystem is more general and more flexible. Install the audit package and read the SuSE audit guide (mostly the part about rules), or try

auditctl -A exit,always -F path=/usr/bin/java -S execve

Or: instead of killing it, kill -STOP it. The STOP suspends the process, no questions asked. You get the option to resume (kill -CONT) or terminate (kill -KILL) later. As long as the process is still around, you can inspect its command line (/proc/12345/cmdline), its memory map (/proc/12345/maps) and so on.

Or: attach a debugger to the process and pause it. It's as simple as gdb --pid 12345 (there may be better options for a Java process); attaching a debugger immediately pauses the process (if you exit the debugger, the process receives a SIGCONT and resumes).

Note that all this only catches OS-level processes, not JVM threads. You need to turn to JVM features to debug threads.

0

There is utility for not just logging but monitoring and managing processes. It is monit - very flexible and useful utility. It can prevent some process to take 100% (or whatever you configures) of CPU (or other resources) for some time (as much as you needs) by automatic restart of the process. And it is logging about such abnormal situations to it's log file or to syslog.

You can find a lot of configuration examples here.

BBK
  • 141