Systemd shutdown timeout does not honor TimeoutStopUSec

Question

I have a systemd user service that potentially takes many minutes to properly shutdown. It's a very large HDF5 database (several GB large) and if the process does not stop cleanly, then the database is corrupt afterwards.

I've found many threads here like this one How to change systemd service timeout value? but sadly they didn't help me at all, I cannot increase the timeout.

Because I had a little of trouble with the timeouts, I wrote this example:

#!/bin/bash
RUNNING=true
VAR_RUN=${HOME}/.run
PIDFILE=${VAR_RUN}/mondas_ctrl_launcher.pid
LOG_DIR=${HOME}/logs
LOG_FILE=${LOG_DIR}/mondas_ctrl_launcher.log
MONDAS_BASE=${HOME}/src/mondas
on_sigint()
{
    RUNNING=false
}
log()
{
    now=$(date)
    message="${now}: ${@}"
    echo ${message}
    echo ${message} >> "${LOG_FILE}"
}
taken from
https://blog.dhampir.no/content/sleeping-without-a-subprocess-in-bash-and-how-to-sleep-forever
Execute this with BASH as it uses bash extensions
snore()
{
    local IFS
    [[ -n "${_snore_fd:-}" ]] || { exec {_snore_fd}<> <(:); } 2>/dev/null ||
    {
        # workaround for MacOS and similar systems
        local fifo
        fifo=$(mktemp -u)
        mkfifo -m 700 "$fifo"
        exec {_snore_fd}<>"$fifo"
        rm "$fifo"
    }
    read ${1:+-t "$1"} -u $_snore_fd || :
}
_mondas_ctrl()
{
    # my true program that starts a lot of
    # processes in a tmux session
    # mondas_ctrl "${@}" >> "${LOG_FILE}" 2>&1
    # doing just "true" for testing purposes
    true
}
mkdir -p "${VAR_RUN}"
mkdir -p "${LOG_DIR}"
case "${1}" in
    start)
        log "Starting mondas PWD: $(pwd)"
        _mondas_ctrl start
        log "mondas_ctrl start executed"
        echo "${$}" > ${PIDFILE}
        # SIGINT
        trap on_sigint 2
        while ${RUNNING} ; do
          snore 1
        done
        log "Exiting sleep loop"
        ;;
    stop)
        log "Stopping mondas"
        _mondas_ctrl stop
        # simulating long shutdown
        snore 275
        log "mondas_ctrl stop executed"
        if test -f "${PIDFILE}" ; then
            kill -2 $(cat "${PIDFILE}")
            rm -rf "${PIDFILE}"
        fi
        ;;
    *)
        echo "usage: $0 start|stop" >&2
        exit 1
        ;;
esac

and my systemd user service:

[Unit]
Description=Mondas
Wants=network.target
After=network.target
[Service]
Type=simple
RemainAfterExit=no
ExecStart=%h/bin/mondas_ctrl_launcher start
ExecStop =%h/bin/mondas_ctrl_launcher stop
TimeoutStartSec=120
TimeoutStopSec=500
Restart=always
RestartSec=1
[Install]
WantedBy=default.target

So I enabled and started it and checked the timeout settings of the daemon

$ systemctl --user enable mondas2.service
$ systemctl --user start mondas2.service
$ systemctl --user show mondas2.service  -p TimeoutStopUSec
TimeoutStopUSec=8min 20s

However if I execute reboot as root, on the console I can see

[***] A stop job is running for User Manager for UID 1000 (20s / 2min)

and after 90 seconds and then systemd just kills the process, the log file mondas_ctrl_launcher.log is missing the "mondas_ctrl stop executed" log entry.

I even changed /etc/systemd/system.conf and set

DefaultTimeoutStartSec=300s
DefaultTimeoutStopSec=300s

but when I execute reboot the console still displays a max. timeout of 2 minutes and after 90 seconds the process is just killed. No matter what I do, I cannot change this behaviour.

What am I doing wrong? Or did I just interpret the meaning of TimeoutStopSec just wrong? Or could it be that the TimeoutStopSec value in the service file does not affect the real timeout when doing a reboot or poweroff and only affects when stopping the service manually via systemctl --user stop? If so, how can I increase the reboot/poweroff timeout?

I'm testing this on a current Debian 10.5 installation.

One thing that you are doing wrong, which isn't the main part of your problem, is not spotting the difference between service descriptions "Mondas" and "User Manager for UID 1000", The clue to an answer for the main part of your problem is the reason that "rickety" and "dangerous" are associated with the mechanism. (-: — JdeBP, Sep 08 '20 at 16:17
@JdeBP so that means that 2min timeout is not the timeout of my user service but of the while session? But how can I increase then that timeout? I don't understand what you mean by the reason that "rickety" and "dangerous" are associated with the mechanism. — Pablo, Sep 08 '20 at 16:32
No, it means that you haven't looked for which of your services has the description "User Manager for UID 1000". And https://unix.stackexchange.com/a/590919/5132 leads to further understanding. — JdeBP, Sep 08 '20 at 16:44
@JdeBP as far as I can see, this is started by user.slice which spawns user-1000.slice from this file /lib/systemd/system/user@.service. — Pablo, Sep 08 '20 at 17:01
@JdeBP thank's for pointing out that service being stopped was not my user service but the user manager service. Executing systemctl edit user@1000 created the file /etc/systemd/system/user@1000.service.d/override.conf where I put TimeoutStopSec=500s thus increasing the timeout. — Pablo, Sep 08 '20 at 17:17

Systemd shutdown timeout does not honor TimeoutStopUSec

taken from

https://blog.dhampir.no/content/sleeping-without-a-subprocess-in-bash-and-how-to-sleep-forever

Execute this with BASH as it uses bash extensions

0 Answers0