1

I have a systemd user service that potentially takes many minutes to properly shutdown. It's a very large HDF5 database (several GB large) and if the process does not stop cleanly, then the database is corrupt afterwards.

I've found many threads here like this one How to change systemd service timeout value? but sadly they didn't help me at all, I cannot increase the timeout.

Because I had a little of trouble with the timeouts, I wrote this example:

#!/bin/bash

RUNNING=true VAR_RUN=${HOME}/.run PIDFILE=${VAR_RUN}/mondas_ctrl_launcher.pid LOG_DIR=${HOME}/logs LOG_FILE=${LOG_DIR}/mondas_ctrl_launcher.log MONDAS_BASE=${HOME}/src/mondas

on_sigint() { RUNNING=false }

log() { now=$(date) message="${now}: ${@}" echo ${message} echo ${message} >> "${LOG_FILE}" }

taken from

https://blog.dhampir.no/content/sleeping-without-a-subprocess-in-bash-and-how-to-sleep-forever

Execute this with BASH as it uses bash extensions

snore() { local IFS [[ -n "${_snore_fd:-}" ]] || { exec {_snore_fd}<> <(:); } 2>/dev/null || { # workaround for MacOS and similar systems local fifo fifo=$(mktemp -u) mkfifo -m 700 "$fifo" exec {_snore_fd}<>"$fifo" rm "$fifo" } read ${1:+-t "$1"} -u $_snore_fd || : }

_mondas_ctrl() { # my true program that starts a lot of # processes in a tmux session # mondas_ctrl "${@}" >> "${LOG_FILE}" 2>&1 # doing just "true" for testing purposes true }

mkdir -p "${VAR_RUN}" mkdir -p "${LOG_DIR}"

case "${1}" in start) log "Starting mondas PWD: $(pwd)" _mondas_ctrl start log "mondas_ctrl start executed" echo "${$}" > ${PIDFILE} # SIGINT trap on_sigint 2 while ${RUNNING} ; do snore 1 done log "Exiting sleep loop" ;; stop) log "Stopping mondas" _mondas_ctrl stop # simulating long shutdown snore 275 log "mondas_ctrl stop executed" if test -f "${PIDFILE}" ; then kill -2 $(cat "${PIDFILE}") rm -rf "${PIDFILE}" fi ;; *) echo "usage: $0 start|stop" >&2 exit 1 ;; esac

and my systemd user service:

[Unit]
Description=Mondas
Wants=network.target
After=network.target

[Service] Type=simple RemainAfterExit=no ExecStart=%h/bin/mondas_ctrl_launcher start ExecStop =%h/bin/mondas_ctrl_launcher stop TimeoutStartSec=120 TimeoutStopSec=500 Restart=always RestartSec=1

[Install] WantedBy=default.target

So I enabled and started it and checked the timeout settings of the daemon

$ systemctl --user enable mondas2.service
$ systemctl --user start mondas2.service
$ systemctl --user show mondas2.service  -p TimeoutStopUSec
TimeoutStopUSec=8min 20s

However if I execute reboot as root, on the console I can see

[***] A stop job is running for User Manager for UID 1000 (20s / 2min)

and after 90 seconds and then systemd just kills the process, the log file mondas_ctrl_launcher.log is missing the "mondas_ctrl stop executed" log entry.

I even changed /etc/systemd/system.conf and set

DefaultTimeoutStartSec=300s
DefaultTimeoutStopSec=300s

but when I execute reboot the console still displays a max. timeout of 2 minutes and after 90 seconds the process is just killed. No matter what I do, I cannot change this behaviour.

What am I doing wrong? Or did I just interpret the meaning of TimeoutStopSec just wrong? Or could it be that the TimeoutStopSec value in the service file does not affect the real timeout when doing a reboot or poweroff and only affects when stopping the service manually via systemctl --user stop? If so, how can I increase the reboot/poweroff timeout?

I'm testing this on a current Debian 10.5 installation.

Pablo
  • 234
  • 3
  • 11
  • One thing that you are doing wrong, which isn't the main part of your problem, is not spotting the difference between service descriptions "Mondas" and "User Manager for UID 1000", The clue to an answer for the main part of your problem is the reason that "rickety" and "dangerous" are associated with the mechanism. (-: – JdeBP Sep 08 '20 at 16:17
  • @JdeBP so that means that 2min timeout is not the timeout of my user service but of the while session? But how can I increase then that timeout? I don't understand what you mean by the reason that "rickety" and "dangerous" are associated with the mechanism. – Pablo Sep 08 '20 at 16:32
  • No, it means that you haven't looked for which of your services has the description "User Manager for UID 1000". And https://unix.stackexchange.com/a/590919/5132 leads to further understanding. – JdeBP Sep 08 '20 at 16:44
  • @JdeBP as far as I can see, this is started by user.slice which spawns user-1000.slice from this file /lib/systemd/system/user@.service. – Pablo Sep 08 '20 at 17:01
  • @JdeBP thank's for pointing out that service being stopped was not my user service but the user manager service. Executing systemctl edit user@1000 created the file /etc/systemd/system/user@1000.service.d/override.conf where I put TimeoutStopSec=500s thus increasing the timeout. – Pablo Sep 08 '20 at 17:17

0 Answers0