0

I am developing a node.js application --- which frequently crashes my Debian Linux kernel: The computer becomes unresponsive and doesn't even respond to 'ping'.

At this stage, I don't even ask to analyze or fix the cause of the crashes. I don't have any information that could point to anything specific. The computer just stops responding, neither /var/log/messages nor dmesg show any messages.

So my question is: What tools can I use to gather some information regarding the crashes?


Here are some background details:

My node.js application doesn't use the network stack. It just spawns two sub-processes with child_process.spawn and communicates with them trough writing files, watching for file changes with fs.watch and reading the files that have changed. The rest is just data processing.

I have tested this problem on three computers:

  • On the first one (my main dev machine), the system freezes reliably after starting this application a few times.
  • On the other computers (a PC similar to the main dev machine and a digitalocean VPS), the application usually runs well --- but after a few hundred runs it froze both the other computers.

It seems that my main dev machine is more prone to this problem --- but because the freezes also happen on two unrelated machines, I assume it is not a pure hardware problem restricted to one PC.

Since the computer freezes immediately after starting the app, I am sure the app causes this problem. And since everything stops (including responses to pings), I assume that the Linux Kernel has crashed.

1 Answers1

2

Typically a linux kernel crash would be visible on the system's console. However, just in case it is indeed a kernel crash but in your case it's not visible for whatever reason you may want to confirm it is indeed a kernel crash. For that you could configure your system to auto-reboot after a kernel crash like this: Configure reboot on Linux kernel panic. If the system ends up rebooting then it is indeed a kernel crash and then you can focus on that investigation path (plenty of related answers on stack exchange sites).

But from your description I think it's more likely to be a kernel hung or "too busy" condition, you could start here: How to investigate cause of total hang?.

Finally, since as you observed the root cause seems more likely to be your application, I would assume it's somehow causing too much of a load on the system causing it to become unresponsive. You could review your code for any lenghty/infinite loops and try to limit their impact: break out after a certain execution time (maybe use some timeout exceptions) or after a certain number of iterations, etc. If the system becomes responsive again after a while then you'd get a better idea which area of your code is at fault and maybe how it impacts the system.