0

I am currently deploying computers in my client's house.

I am running the following scripts:

  1. ngrok (an ssh forward tunneling daemon)
  2. heartbeat.py (a script which sends a heartbeat signal to loggly which confirms my computer is alive)
  3. metrics.py (a script which logs all the environmental data such as temp, disk space to loggly)

So in my tests so far, metrics.py is somewhat unstable (meaning it crashes occasionally).

Is there a package in *NIX which does the following?

  1. check every X second on whether a process is running
  2. if #1 is not true, run it
  3. Do this for a list of process
Edward
  • 21

2 Answers2

1

Much as I dislike systemd, I have to admit it can definitely do that.

Not all init systems support automatically restarting failed processes.

However, note that checking whether a process is still "running" is only the most rudimentary health check you can do. It's better if the program's main loop can check for some kind of "are you still alive?" message and reply to it. Then you know it hasn't got stuck in an infinite loop, or stuck waiting for I/O that won't complete.

0

Probaly, simple script can help:

ps -axu | grep '[n]grok' 2>&1 1>/dev/null || bash -c "ngrok"

The script above checks the running state of ngrok, if it is not running, execute the command to start it. The brackets in grep command help to filter out the grep command itself in the result.

Add this to you cron config file, it will be checked periodically.

NOTE:

You may need to add some delay between the check and restart, also an upper retry limit is need to prevent situations that ngrok does have some critical error and cannot start again.