5

I have the following script:

suspense_cleanup () {
  echo "Suspense clean up..."
}

int_cleanup () { echo "Int clean up..." exit 0 }

trap 'suspense_cleanup' SIGTSTP trap 'int_cleanup' SIGINT

sleep 600

If I run it and press Ctrl-C, Int clean up... show and it exits.

But if I press Ctrl-Z, the ^Z characters are displayed on the screen and then it hangs.

How can I:

  • Run some cleanup code on Ctrl-Z, maybe even echoing something, and
  • proceed with the suspension afterwards?

Randomly reading through the glibc documentation, I found this:

Applications that disable the normal interpretation of the SUSP character should provide some other mechanism for the user to stop the job. When the user invokes this mechanism, the program should send a SIGTSTP signal to the process group of the process, not just to the process itself.

But I'm not sure if that's applicable here, and in any case it doesn't seem to work.

Context: I'm trying to make an interactive shell script which supports all the suspense-related features that Vim/Neovim supports. Namely:

  1. Ability to suspend programatically (with :suspend, instead of just letting the user press Ctrl-z)
  2. Ability to perform an action before suspending (autosave in Vim)
  3. Ability to perform an action before resuming (VimResume in NeoVim)

Edit: Changing sleep 600 to for x in {1..100}; do sleep 6; done also doesn't work.

Edit 2: It works when replacing sleep 600 with sleep 600 & wait. I'm not at all sure why or how that works, or what are the limitations of something like this.

3 Answers3

6

Signals handling on Linux and other UNIX-like systems is a very complex subject with many actors at play: kernel terminal driver, parent -> child process relation, process groups, controlling terminal, shell handling of signals with job control enabled/disabled, signal handlers in individual processes and possibly more.

First, Control-C, Control-Z keybindings are not handled by shell but by the kernel. You can see the default definitions with stty -a:

$ stty -a
speed 38400 baud; rows 64; columns 212; line = 0;
intr = ^C; quit = ^\; erase = ^H; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff -iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc

Here we see intr = ^C and susp = ^Z. stty in turn gets this information from the kernel using TCGETS ioctl syscall:

$ strace stty -a |& grep TCGETS
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0

The default keybindings are defined in Linux kernel code in

#define INIT_C_CC {         \
        [VINTR] = 'C'-0x40, \
        [VQUIT] = '\\'-0x40,        \
        [VERASE] = '\177',  \
        [VKILL] = 'U'-0x40, \
        [VEOF] = 'D'-0x40,  \
        [VSTART] = 'Q'-0x40,        \
        [VSTOP] = 'S'-0x40, \
        [VSUSP] = 'Z'-0x40, \
        [VREPRINT] = 'R'-0x40,      \
        [VDISCARD] = 'O'-0x40,      \
        [VWERASE] = 'W'-0x40,       \
        [VLNEXT] = 'V'-0x40,        \
        INIT_C_CC_VDSUSP_EXTRA      \
        [VMIN] = 1 }

The default actions are also defined:

static void n_tty_receive_char_special(struct tty_struct *tty, unsigned char c,
                                       bool lookahead_done)
{
        struct n_tty_data *ldata = tty->disc_data;
    if (I_IXON(tty) &amp;&amp; n_tty_receive_char_flow_ctrl(tty, c, lookahead_done))
            return;

    if (L_ISIG(tty)) {
            if (c == INTR_CHAR(tty)) {
                    n_tty_receive_signal_char(tty, SIGINT, c);
                    return;
            } else if (c == QUIT_CHAR(tty)) {
                    n_tty_receive_signal_char(tty, SIGQUIT, c);
                    return;
            } else if (c == SUSP_CHAR(tty)) {
                    n_tty_receive_signal_char(tty, SIGTSTP, c);
                    return;
            }

The signal finally goes to __kill_pgrp_info() that says:

/*
 * __kill_pgrp_info() sends a signal to a process group: this is what the tty
 * control characters do (^C, ^Z etc)
 * - the caller must hold at least a readlock on tasklist_lock
 */

That's important for our story - the signal generated with Control-C and Control-Z is sent to foreground process group created by parent interactive shell whose leader is a newly run script. The script and its children belong to one group.

Therefore, as correctly noted in the comments by user Kamil Maciorowski, when you send Control-Z after starting your script SIGTSTP signal is received both by the script and sleep because when a signal is sent to a group it received by all processes in the group. It would be easy to see if you removed traps from your code so that it looks like that (BTW, always add a https://en.wikipedia.org/wiki/shebang_(unix), it's not defined what should happen if there is no shebang))

#!/usr/bin/env bash

suspense_cleanup () {

echo "Suspense clean up..."

}

int_cleanup () {

echo "Int clean up..."

exit 0

}

trap 'suspense_cleanup' SIGTSTP

trap 'int_cleanup' SIGINT

sleep 600

Run it (I named it sigtstp.sh) and stop it:

$ ./sigtstp.sh
^Z
[1]+  Stopped                 ./sigtstp.sh
$ ps aux | grep -e '[s]leep 600' -e '[s]igtstp.sh'
ja       27062  0.0  0.0   6908  3144 pts/25   T    23:50   0:00 sh ./sigstop.sh
ja       27063  0.0  0.0   2960  1664 pts/25   T    23:50   0:00 sleep 600

ja is my username, yours will be different, PIDs will also be different but what matters is that both process are in stopped state as indicated by letter 'T'. From man ps:

PROCESS STATE CODES
(...)
T    stopped by job control signal

That means that both processes got SIGTSTP signal. Now, if both processes, including sigstop.sh get signal, why isn't suspense_cleanup() signal handler run? Bash does not execute it until sleep 600 terminates. It's requirement imposed by POSIX:

When a signal for which a trap has been set is received while the shell is waiting for the completion of a utility executing a foreground command, the trap associated with that signal shall not be executed until after the foreground command has completed.

(notice though in the open-source world and IT in general standard is just a collection of hints and there is no legal requirement to force anyone to follow them). It wouldn't help if you slept less, say 3 seconds because sleep process would be stopped anyway so it would never complete. In order for suspense_cleanup() to be called immediately we have to run it in the background and run wait as also explained in the above POSIX link:

#!/usr/bin/env bash

suspense_cleanup () { echo "Suspense clean up..." }

int_cleanup () { echo "Int clean up..." exit 0 }

trap 'suspense_cleanup' SIGTSTP trap 'int_cleanup' SIGINT

sleep 600 & wait

Run it and stop it:

$ ./sigstop.sh
^ZSuspense clean up...

Notice that both sleep 600 and sigtstp.sh are now gone:

$ ps aux | grep -e '[s]leep 600' -e '[s]igtstp.sh'
$

It's clear why sigtstp.sh is gone - wait was interrupted by signal, it's the last line in the script so it exits. It's even more surprising when you realize that if you sent SIGINT sleep would still run even after death of sigtstp.sh:

$ ./sigtstp.sh
^CInt clean up...
$ ps aux | grep -e '[s]leep 600' -e '[s]igtstp.sh'
ja       32354  0.0  0.0   2960  1632 pts/25   S    00:12   0:00 sleep 600

But, due to its parent death it would be adopted by init:

$ grep PPid /proc/32354/status
PPid:   1

The reason for that is when shell runs a child in the background it disables default SIGINT handler which is to terminate process (signal(7)) in it](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html):

If job control is disabled (see the description of set -m) when the shell executes an asynchronous list, the commands in the list shall inherit from the shell a signal action of ignored (SIG_IGN) for the SIGINT and SIGQUIT signals. In all other cases, commands executed by the shell shall inherit the same signal actions as those inherited by the shell from its parent unless a signal action is modified by the trap special built-in (see trap)

Some SO references: https://stackoverflow.com/questions/46061694/bash-why-cant-i-set-a-trap-for-sigint-in-a-background-shell/46061734#46061734, https://stackoverflow.com/questions/45106725/why-do-shells-ignore-sigint-and-sigquit-in-backgrounded-processes. If you want to kill all children after receiving SIGINT you have to do it manually in the trap handler. Notice, however, that SIGINT is still delivered to all children but just ignored - if you didn't use sleep but a command that installs its own SIGINT handler if would run (try tcpdump for example)! Glibc manual says:

Note that if a given signal was previously set to be ignored, this code avoids altering that setting. This is because non-job-control shells often ignore certain signals when starting children, and it is important for the children to respect this.

But why is sleep dead after sending SIGTSTP to it if we don't kill it ourselves and SIGTSTP should only stop it, not kill it? All stopped process belonging to orphaned process group get SIGHUP from kernel:

If the exit of the process causes a process group to become orphaned, and if any member of the newly-orphaned process group is stopped, then a SIGHUP signal followed by a SIGCONT signal shall be sent to each process in the newly-orphaned process group.

SIGHUP terminates the process if no custom handler for it was installed (signal(7)):

Signal      Standard   Action   Comment
SIGHUP       P1990      Term    Hangup detected on controlling terminal
                                or death of controlling process

(notice that if you ran sleep under strace things would get even more complex...).

OK, so how about coming back to your original question:

How can I:

Run some cleanup code on Ctrl-Z, maybe even echoing something, and proceed with the suspension afterwards?

The way I would do it is:

#!/usr/bin/env bash

suspense_cleanup () { echo "Suspense clean up..." trap - SIGTSTP kill -TSTP $$ trap 'suspense_cleanup' SIGTSTP }

int_cleanup () { echo "Int clean up..." exit 0 }

trap 'suspense_cleanup' SIGTSTP trap 'int_cleanup' SIGINT sleep 600 & while true do if wait then echo child died, exiting exit 0 fi done

Now suspense_cleanup() will be called before stopping the process:

$ ./sigtstp.sh
^ZSuspense clean up...

[1]+ Stopped ./sigtstp.sh $ ps aux | grep -e '[s]leep 600' -e '[s]igtstp.sh' ja 4129 0.0 0.0 6920 3196 pts/25 T 00:29 0:00 bash ./sigtstp.sh ja 4130 0.0 0.0 2960 1660 pts/25 T 00:29 0:00 sleep 600 $ fg ./sigtstp.sh ^ZSuspense clean up...

[1]+ Stopped ./sigtstp.sh $ fg ./sigtstp.sh ^CInt clean up... $ ps aux | grep -e '[s]leep 600' -e '[s]igtstp.sh' ja 4130 0.0 0.0 2960 1660 pts/25 S 00:29 0:00 sleep 600 $ grep PPid /proc/4130/status PPid: 1

And you can sleep less, say 10 seconds and see that script would exit if sleep finished:

#!/usr/bin/env bash

suspense_cleanup () { echo "Suspense clean up..." trap - SIGTSTP kill -TSTP $$ trap 'suspense_cleanup' SIGTSTP }

int_cleanup () { echo "Int clean up..." exit 0 }

trap 'suspense_cleanup' SIGTSTP trap 'int_cleanup' SIGINT

sleep 600 &

sleep 10 & while true do if wait then echo child died, exiting exit 0 fi done

Run it:

$ time ./sigtstp.sh
child died, exiting

real 0m10.007s user 0m0.003s sys 0m0.004s

-1

It appears that Bash job control interferes with the normal tty special characters. SIGTSTP may be sent to Bash, but not to the process being run.

From https://www.gnu.org/software/bash/manual/bash.html#Job-Control-Basics:

If the operating system on which Bash is running supports job control, Bash contains facilities to use it. Typing the suspend character (typically ‘^Z’, Control-Z) while a process is running causes that process to be stopped and returns control to Bash.

Continue (Ctrl-Q, called start in stty) does not work under Bash job control: you need to fg or bg the process to un-stop it.

Paul_Pedant
  • 8,679
  • 1
    Not really, no. I can send TSTP and CONT to processes. See Why did my trap not trigger? – Chris Davies Jul 19 '23 at 22:55
  • That's a slightly different issue: you can send signals from another terminal, and they will get deferred until the system sees fit to pass them to the process. My point is that Bash Ctrl-Z itself stops the process indefinitely: a stopped process cannot receive any signals. It cannot even be killed by SIGTERM until it is also restarted by Bash job control. It cannot be resumed by Ctrl-Q, but (special case) it can be resumed by SIGCONT. – Paul_Pedant Jul 20 '23 at 09:07
-2

On reading man 7 signal, or https://www.man7.org/linux/man-pages/man7/signal.7.html, one sees that:

The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored

So, you cannot.

waltinator
  • 4,865