Why doesn't SSH -t wait for background processes?

Question

Why is it that ssh -t doesn't wait for background jobs to finish?

Example:

ssh user@example 'sleep 2 &'

This works as expected, since ssh returns after 2 seconds, whereas

ssh user@example -t 'sleep 2 &'

does not wait for sleep to finish and returns immediately.

Can anyone explain the reason behind this? Is there a way to let ssh -t wait for all background processes to finish before returning?

My use case is that I start a script with ssh -t, and this script starts several background jobs that should stay alive after the main script finishes. With ssh -t this is not possible so far.

Stéphane Chazelas · Accepted Answer · 2023-10-02T12:07:06.287

26

Without -t, sshd gets the stdout of the remote shell (and children like sleep) and stderr via two pipes (and also sends the client's input via another pipe).

sshd does wait for the process in which it has started the user's login shell, but also, after that process has terminated waits for eof on the stdout pipe (not the stderr pipe in the case of openssh at least).

And eof happens when there's no file descriptor by any process open on the writing end of the pipe, which typically only happens when all the processes that didn't have their stdout redirected to something else are gone.

When you use -t, sshd doesn't use pipes. Instead, all the interaction (stdin, stdout, stderr) with the remote shell and its children are done using one pseudo-terminal pair.

With a pseudo-terminal pair, for sshd interacting with the master side, there's no similar eof handling and while at least some systems provide alternative ways to know if there are still processes with fds open to the slave side of the pseudo-terminal (see @JdeBP comment below), sshd doesn't use them, so it just waits for the termination of the process in which it executed the login shell of the remote user and then exits.

Upon that exit, the master side of the pty pair is closed ~~which means the pty is destroyed, so processes controlled by the slave will receive a SIGHUP (which by default would terminate them).~~

Edit: that last part was incorrect, though the end result is the same. See @UNIX.root's answer for a correct description of what exactly happens.

edited Oct 02 '23 at 12:07

answered Feb 21 '17 at 13:10

Stéphane Chazelas

544,893

1

thanks for the thorough answer! another thing i'd like to know: do all background processes terminate once the pseudo terminal exits? the script i'm starting starts a service, which works well with ssh. but when using ssh -t, the service is not started. it seems that the service gets shut down once ssh returns. – Philipp Murry Feb 21 '17 at 14:59
Actually, there is a way for the master side of a pseudo-terminal to know when all slave file descriptors have been closed. It's the same mechanism triggered by a real terminal when all file descriptors to it have been closed, in fact. – JdeBP Feb 21 '17 at 15:32
1

@Philipp, see What happens to a continuing operation if we do ssh and then disconnect? See also How to terminate remotely called "tail -f" when connection is closed? – Stéphane Chazelas Feb 21 '17 at 16:42
@JdeBP, would you care to expand? I'm not sure what you mean. AFAICT terminal emulators (xterm and gnome-terminal at least) don't care about processes still having fds opened to the slave when the process they executed the shell in dies – Stéphane Chazelas Feb 21 '17 at 16:43
@PhilippMurry You can use nohup to keep a script running like that. (You might also consider starting long-running jobs inside of tmux so you can monitor their progress interactively, but a log file works fine.) – jpaugh Feb 21 '17 at 17:27
See https://lists.freedesktop.org/archives/systemd-devel/2013-December/015502.html . Irrespective of whether sshd or xterm use it, the mechanism is there and there is a way to know this. At least one terminal emulator, mine, does use it and terminates when the line discipline drops virtual DTR. – JdeBP Feb 21 '17 at 18:19
@JdeBP, thanks. Makes sense. Do you know how portable that is outside of Linux? Any idea why other terminal emulators or sshd don't do it (in the case of terminal emulators, I can imagine it being annoying if you've run firefox &! and that prevents a terminal emulator from closing) – Stéphane Chazelas Feb 21 '17 at 21:17
"not the stderr pipe in the case of openssh at least" – Please verify. I suspect something has changed since the answer was posted. – Kamil Maciorowski Mar 16 '23 at 21:52

UNIX.root · Answer 2 · 2021-12-06T06:34:01.560

_{(Moved comments here to include more information.)}

The SIGHUP part in the accepted answer is not correct.

Upon that exit, the master side of the pty pair is closed which means the pty is destroyed, so processes controlled by the slave will receive a SIGHUP.

This is not the case. According to POSIX, "If a modem disconnect is detected by the terminal interface for a controlling terminal [...] the SIGHUP signal shall be sent to the controlling process." For ssh -t 'sleep 2 &', it's the controlling process exiting which causes the tty disconnect so SIGHUP cannot be sent to the controlling process since it's already dead. The sleep is killed by SIGHUP is actually because when the session leader exits, "the SIGHUP signal shall be sent to each process in the foreground process group".

The confusing part is in sleep 2 &. Yes it's a command running in background but it's not part of a background process group. Background process group is related to job control which is by default disabled in non-interactive shell (as in ssh ... 'sleep 2 &'). Actually the sleep 2 & is running in the foreground process group. For example:

$ ssh -t localhost 'sleep 2 & ps jt'
  PPID    PID   PGID    SID TTY       TPGID STAT   UID   TIME COMMAND
 88819  88825  88825  88825 pts/36    88825 Ss+      0   0:00 bash -c sleep 2 & ps jt
 88825  88826  88825  88825 pts/36    88825 S+       0   0:00 sleep 2
 88825  88827  88825  88825 pts/36    88825 R+       0   0:00 ps jt

As we can see, all the processes' PGID (88825) is the same as PID of the bash shell and TPGID is also 88825. That's to say the background process sleep 2 & is also in this foreground process group.

For comparison, see

$ pgrep -af sleep
$ ssh -t localhost 'set -m; sleep 123 & ps jt'
  PPID    PID   PGID   SID TTY   TPGID STAT UID TIME COMMAND
 89002  89008  89008 89008 pts/3 89010 Ss     0 0:00 bash -c set -m; sleep 123 & ps jt
 89008  89009  89009 89008 pts/3 89010 S      0 0:00 sleep 123
 89008  89010  89010 89008 pts/3 89010 R+     0 0:00 ps jt
Connection to localhost closed.
$ ps j 89009
  PPID    PID   PGID    SID TTY       TPGID STAT   UID   TIME COMMAND
     1  89009  89009  89008 ?            -1 S        0   0:00 sleep 123
$

As we can see, with job control enabled (set -m), sleep 2 & is running in its own process group (PGID 89009) which is a background process group. And after ssh terminates, the sleep is still running.

(See a similar scenario for more discussion: Expect + "ssh -f" does not work)

Thanks for the correction. Note that regardless of whether sleep is killed by SIGHUP or not (like when it's not in the foreground process group or ignores SIGHUP), ssh returns straight away. The main reason is that sshd has no way to know that there's still a process running with a fd open to the pty device and exits when the shell exits. — Stéphane Chazelas, Sep 22 '20 at 16:04
And on macOS, fcntl(O_NONBLOCK) on pty/master would fail before the child opens the pty/slave. In sexpect it waits for the pty/master to be writable before fcntl(O_NONBLOCK). — UNIX.root, Sep 23 '20 at 02:40

score 3 · Answer 3 · edited Feb 23 '17 at 03:35

3

Use wait:

ssh user@example -t 'sleep 2 & wait'

edited Feb 23 '17 at 03:35

heemayl

56,300

answered Feb 21 '17 at 12:51

Ipor Sircer

14,546
1
27
39

The question focuses on "why?" rather than on finding a workaround. – Kusalananda Oct 02 '23 at 13:12

Why doesn't SSH -t wait for background processes?

3 Answers3

Linked