12

I configured the service - calc_mem.service

as the following

Restart=on-failure
RestartSec=5
StartLimitInterval=400
StartLimitBurst=3

the configuration above should do the following from my understanding

the service have 3 retries when service exit with error

and before service start it will wait 5 seconds

also I found that "Restart" can be also:

Restart=always

I can understand that need to restart the service on failure but what is the meaning of Restart=always ?

in which case we need to set - Restart=always

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
yael
  • 13,106

3 Answers3

20

The systemd.service man page has a description of the values Restart= takes, and a table of what options cause a restart when. Always pretty much does what it says on the lid:

If set to always, the service will be restarted regardless of whether it exited cleanly or not, got terminated abnormally by a signal, or hit a timeout.

I don't know for sure what situation they had in mind for that feature, but we might hypothesise e.g. a service configured to only run for a fixed period of time or to serve fixed number of requests and to then stop to avoid any possible resource leaks. Having systemd do the restarting makes for a cleaner implementation of the service itself.

In some sense, we might also ask why not include that option in systemd. Since it is capable of restarting services on failure, they might as well include the option of restarting the service always, just in case someone needs it. To provide tools, not policy.

Note also that a "successful exit" here is defined rather broadly:

If set to on-success, it will be restarted only when the service process exits cleanly. In this context, a clean exit means an exit code of 0, or one of the signals SIGHUP, SIGINT, SIGTERM or SIGPIPE, [...]

SIGHUP is a common way of asking a process to restart, but it unhandled, it terminates the process. So having Restart=always (or Restart=on-success) allows to use SIGHUP for restarting, even without the service itself supporting that.

Also, as far as I can read the man page, always doesn't mean it would override the limits set by StartLimitInterval and StartLimitBurst:

Note that service restart is subject to unit start rate limiting configured with StartLimitIntervalSec= and StartLimitBurst=, see systemd.unit(5) for details. A restarted service enters the failed state only after the start limits are reached.

ilkkachu
  • 138,973
  • Very nice answer. This helped me find the definition for Reason=on-abort as well. See https://unix.stackexchange.com/questions/564443/what-does-restart-on-abort-mean-in-a-systemd-service/564444#564444 – PatS Jan 27 '20 at 21:36
3

If set to on-failure, the service will be restarted when the process exits with a non-zero exit code, is terminated by a signal (including on core dump, but excluding the aforementioned four signals), when an operation (such as service reload) times out, and when the configured watchdog timeout is triggered. [...] If set to always, the service will be restarted regardless of whether it exited cleanly or not, got terminated abnormally by a signal, or hit a timeout.

Excerpt from https://www.freedesktop.org/software/systemd/man/systemd.service.html

So if you set on-failure, it won't get restarted on clean exit.

Panki
  • 6,664
  • ok now its clearly , about my configuration , do you see some issue with my configuration ? or this is exactly what should be in order to restart 3 try on fail – yael Mar 22 '19 at 09:03
3

@JdeBP suggested there is another way to look at this question.

Restart=always is simpler. Easier to implement, easier to understand. Why would we ever want check whether the service terminated with exit code 0 (EXIT_SUCCESS) ? There might even have been a weird bug / error in the service, that caused it to terminate with exit code 0 when it should not have done so.

Answer 1: There are some units that must not use Restart=always. In particular, if the service exits after an idle timeout.

Intriguingly, it would not matter so much if a bug/error causes such a service to exit "successfully", when it should not have done so. Because an idle timeout implies that the service has already been set up to start automatically when a new request is made.

However Restart=on-failure might be used for a service which can exit on idle in some configurations, but not in others. systemd-networkd uses it for this reason.

Answer 2: System administration practices may include killing or messaging service processes to stop them. Sometimes people use a plain kill command, but there are also scripts like apachectl. The advantage of Restart=on-failure is that it is less risky for systemd to recommend using it (as the man page does).

However systemd is left in a strange position, where they also support Restart=always, and that is what they like to set for the majority of long-running services inside the systemd project... This does not seem very helpful when you are trying to learn about systemd service definitions.

sourcejedi
  • 50,249
  • 1
    This is somewhat backwards. Unconditional restart is actually the behaviour of long standing, from quite a number of existing service managers over the decades. It's adding all of the conditional restarts, based upon process exit code and status, that is the extra complexity that came later. I speak as one of the people who has implemented such things, several times over. And indeed this was the history of systemd development, too. – JdeBP Mar 22 '19 at 11:50
  • @JdeBP edited. It is not so obvious when you are coming across this afterwards, and just see a list of possible options. Thanks for your perspective :-). – sourcejedi Mar 22 '19 at 12:16
  • @JdeBP my guess that this was about various bus-activated units which terminate after inactivity does not seem to be right though. systemd-timedated.service does not bother with Restart=on-failure... and I guess there is no reason that it should. Hmm, plenty of other services seem to use it though, including systemd-networkd. So I wonder what they have in common. – sourcejedi Mar 22 '19 at 12:24
  • @JdeBP made me smile :-). networkd turns out to be a more complex compromise, I can understand that. But then leaving Restart=always in all the other systemd daemons seems so confusing to me. It's like there's a subtle advantage that systemd strongly cares about, and simultaneously doesn't think is important enough to document. – sourcejedi Mar 25 '19 at 16:07