I'm supervising a script with svscan which may run into errors and stuck there. When the script dies svscan restarts it right away but dies immediately. So it keeps restarting it. I don't seem to find any setting or configuration for svscan such as a retry count or something. Are you aware of anything similar?
2 Answers
The original Bernstein daemontools has no mechanism for this. There is only a run
program in the service directory and a fixed auto-restart policy. However, several members of the daemontools family have improved upon this, and have flexible general-purpose mechanisms that can be employed to address such situations.
- Gerrit Pape's runit and Laurent Bercot's s6 both provide the mechanism of a
finish
program. - Bruce Guenter's daemontools-encore provides the mechanism of a
notify
program. - Wayne Marshall's perp has the mechanism of invoking the
rc.main
program with thereset
subcommand. - My nosh toolset provides the mechanism of a
restart
program.
These are all general-purpose mechanisms that can be used in your situation. I'm just going to discuss nosh service management in further detail here. It should be fairly obvious how to apply this to the others.
auto-restart control in nosh
In nosh service management, the same extended (relative to the Bernstein original) service states are employed as by daemontools-encore: stopped, starting, started, running, failed, and stopping.
Before a service is started, when it is in the starting state, the nosh service manager runs the start
program.
When a service terminates, and it is still "up", the nosh service manager runs the restart
program, in the failed state, to determine whether to transition back to the running state or to the stopping (and thence to the stopped) state. The restart
program makes this determination for it, in a service-specific way, and it is explicitly the place to put decisions about whether a service has restarted "too many times" or has become unrestartable in some other way.
restart
is passed, as command-line arguments, information about how the main service process terminated (whether it was a normal exit or in response to a signal, and the exit code or specific signal). The service manager chooses to transition back to the running state if the restart
program can be run and terminates with a success status, and transitions to the stopping (and thence stopped) state otherwise.
restart
and start
can be anything that you like: Perl programs, shell scripts, execline programs, compiled binaries, and so forth. They are fairly trivial exercises in shell scripting, with case
…esac
and if
…fi
. There are some examples of restart
programs written in shell script supplied in the nosh-bundles package, which is available for Debian/Ubuntu and for FreeBSD/PC-BSD/DragonFlyBSD/&c..
So to make a decision about whether a service has been restarted "too many times" you have your start
program zero-initialize a counter (in a file in the service directory) and your restart
program increment that counter and only return a success status if the counter is less than a certain value. Of course, you can make the decision include more and other factors.
- You might, say, also want to stop auto-restarting if the service crashes or aborts with a signal and only auto-restart if it exits cleanly or is cleanly terminated by a "clean termination" signal such as
SIGTERM
. - You might, say, want to stop auto-restarting if you detect that the service has become corrupt (in some service-specific way) to the extent that it cannot ever again start properly without administrator intervention.
- You might, say, want to introduce some sort of rate throttling mechanism, involving a calculated
sleep
period.
Further reading
- Jonathan de Boyne Pollard (2015). The daemontools family. Frequently Given Answers.
- Jonathan de Boyne Pollard.
service-manager
. nosh Guide.The nosh Guide is available as a Debian/Ubuntu package and a FreeBSD/PC-BSD/DragonFlyBSD/&c. package, and the
service-manager
manual is accessible on your machine without any Internet connection required via:man service-manager
xdg-open /usr/local/share/doc/nosh/service-manager.html
- Bruce Guenter.
supervise
. daemontools-encore manual. §8. - Gerrit Pape.
runsv
. runit manual. §8. - Wayne Marshall (2013).
perpetrate
. perp manual. §5. - Laurent Bercot.
s6-supervise
. s6 manual. Skarnet software.

- 68,745