12

I'm digging through different sources, but can't find a good description of the anatomy of child reaping. This is a simple case of what I would like to understand.

$ cat <( sleep 100 & wait ) &
[1] 14247
$ ps ax -O pgid | grep $$
12126 12126 S pts/17   00:00:00 bash
14248 12126 S pts/17   00:00:00 bash
14249 12126 S pts/17   00:00:00 sleep 100
14251 14250 S pts/17   00:00:00 grep --color=auto 12126
$ kill -2 14248

$ ps ax -O pgid | grep $$
12126 12126 S pts/17   00:00:00 bash
14248 12126 Z pts/17   00:00:00 [bash] <defunct>
14249 12126 S pts/17   00:00:00 sleep 100
14255 14254 S pts/17   00:00:00 grep --color=auto 12126

Why is the zombie waiting for the kid?

Can you explain this one? Do I need to know C and read Bash source code to get a wider understanding of this or is there any documentation? I've already consulted:

GNU bash, version 4.3.42(1)-release (x86_64-pc-linux-gnu)

Linux 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

  • 2
    Should note that this really has nothing to do with bash (other than the fact that if you choose to use bash as your shell, a lot of processes will be started by it). Other shells (tcsh, ksh, zsh, &c) all start processes, and run essentially the same OS functions to deal with them. – jamesqf Aug 15 '16 at 01:34
  • @jamesqf Interesting. If you would like to expand your comment into a fully-fledged answer, that would be great. –  Aug 15 '16 at 01:39
  • 1
    Except that it's not really an answer, just pointing out that you've been looking for the answer in the wrong places :-) Any good book on *nix systems programming should provide a much better answer than I could write. – jamesqf Aug 16 '16 at 19:40

2 Answers2

18

The zombie isn't waiting for its child. Like any zombie process, it stays around until its parent collects it.

You should display all the processes involved to understand what's going on, and look at the PPID as well. Use this command line:

ps -t $(tty) -O ppid,pgid

The parent of the process you're killing is cat. What happens is that bash runs the background command cat <( sleep 100 & wait ) in a subshell. Since the only thing this subshell does is to set up some redirection and then run an external command, this subshell is replaced by the external command. Here's the rundown:

  • The original bash (12126) calls fork to execute the background command cat <( sleep 100 & wait ) in a child (14247).
    • The child (14247) calls pipe to create a pipe, then fork to create a child to run the process substitution sleep 100 & wait.
      • The grandchild (14248) calls fork to run sleep 100 in the background. Since the grandchild isn't interactive, the background process doesn't run in a separate process group. Then the grandchild waits for sleep to exit.
    • The child (14247) calls setpgid (it's a background job in an interactive shell so it gets its own process group), then execve to run cat. (I'm a bit surprised that the process substitution isn't happening in the background process group.)
  • You kill the grandchild (14248). Its parent is running cat, which knows nothing about any child process and has no business calling wait. Since the grandchild's parent doesn't reap it, the grandchild stays behind as a zombie.
  • Eventually, cat exits — either because you kill it, or because sleep returns and closes the pipe so cat sees the end of its input. At that point, the zombie's parent dies, so the zombie is collected by init and init reaps it.

If you change the command to

{ cat <( sleep 100 & wait ); echo done; } &

then cat runs in a separate process, not in the child of the original bash process: the first child has to stay behind to run echo done. In this case, if you kill the grandchild, it doesn't stay on as a zombie, because the child (which is still running bash at that point) reaps it.

See also How does linux handles zombie process and Can a zombie have orphans? Will the orphan children be disturbed by reaping the zombie?

  • I was surprised at the process group thing too. It looks like it was a bug and it's now fixed in the bash master branch. – Petr Skocik Aug 14 '16 at 23:06
  • "The original bash waits for its child (14247)." Why or in what way? The child is supposed to run in the background and there's no explicit call. What's the difference between the original bash (14246) waiting for 14247 and 14247 (which is running cat) not waiting for 14248 (waiting for sleep)? Is there some memory of who waits for whom, which the child (14247) lost and the original bash (14246) didn't, or maybe a list of signals like SIGCHLD of who should be called and 14247 (now running bash) unsubscribed from with regards to 14248? –  Aug 14 '16 at 23:30
  • 1
    @tomas I meant that the original bash calls wait on its child, i.e. it reaps it. I can see how this would be confusing, I've removed that sentence which wasn't even at the right point chronologically speaking. The information that a process has died goes to that process's parent, a process can't “subscribe” to receive information about the death of some other process. – Gilles 'SO- stop being evil' Aug 15 '16 at 09:02
6

Zombie is not waiting for the child. Instead, zombie is the process that already died (by its own, or was killed - as in your example), had its code, data and stack deallocated, and now only contains its exit code, waiting for its parent to call wait(2) to retrieve it (and thus finally clean process entry completly from the process table)

In your example, when sleep finishes (or is killed), parent will read the exit statuses, and reap the zombies. See above mentioned wait(2) for details.

Matija Nalis
  • 3,111
  • 1
  • 14
  • 27