2

I'm learning about process groups, a new thing for me. (I try to follow this anwer, inter alia: Why is SIGINT not propagated to child process when sent to its parent process?). I try and can't kill a process group, as it's ID seems to be running away.

$ sleep 1000 &
[1] 6468
$ ps ax -O tpgid | grep sleep
 6468  6511 S pts/4    00:00:00 sleep 1000
 6512  6511 S pts/4    00:00:00 grep --color=auto sleep
$ kill -9 -6511
bash: kill: (-6511) - No such process
$ ps ax -O tpgid | grep sleep
 6468  6515 S pts/4    00:00:00 sleep 1000
 6516  6515 S pts/4    00:00:00 grep --color=auto sleep
$ ps ax -O tpgid | grep sleep
 6468  6517 S pts/4    00:00:00 sleep 1000
 6518  6517 S pts/4    00:00:00 grep --color=auto sleep

Why is this so, and how can I catch and kill it? What am I getting and doing wrong?

GNU bash, version 4.3.42(1)-release (x86_64-pc-linux-gnu)

1 Answers1

5

That's because you're not printing the process group ID (PGID), you're printing the "controlling tty process group ID", tpgid. As explained in man ps:

   tpgid       TPGID     ID of the foreground process group on the tty
                         (terminal) that the process is connected to, or
                         -1 if the process is not connected to a tty.

So, what you're seeing is the PID of the foreground process which, in your case, is the ps program:

$ sleep 1000 &
[1] 6745
$ ps ax -O tpgid | grep -E 'sleep|ps a'
 6745  7136 S pts/1    00:00:00 sleep 1000
 7136  7136 R pts/1    00:00:00 ps ax -O tpgid
 7137  7136 S pts/1    00:00:00 grep --color -E sleep|ps a

as you can see above, the tpgid value printed is the PID of the ps process. What you're looking for is pgid, not tpgid:

   pgid        PGID      process group ID or, equivalently, the process ID
                         of the process group leader.  (alias pgrp).


$ ps ax -O pgid | grep -E 'sleep|ps a'
 8414  8414 S pts/1    00:00:00 sleep 1000
 8656  8656 R pts/1    00:00:00 ps ax -O pgid
 8657  8656 S pts/1    00:00:00 grep --color -E sleep|ps a

Of course, since you're not actually running any process group (this happens when, for example, a script calls other scripts), the PGID for sleep is the same as its PID. Nevertheless, you can actually kill it that way if you like:

$ kill -9 -8414
$ ps ax -O pgid | grep -E 'sleep|ps a'
10065 10065 R pts/1    00:00:00 ps ax -O pgid
10066 10065 S pts/1    00:00:00 grep --color -E sleep|ps a
[1]+  Killed                  sleep 1000

A more informative example would be to run a script like this:

#!/bin/bash

sleep 1000 &
sleep 1000 &
sleep 1000 &

sleep 1000

If I save that as foo.sh and run it, the various sleep commands will all have the same PGID:

$ foo.sh &
[1] 13555
$ ps ax -O pgid | grep -P '[s]leep|[f]oo.sh'
13555 13555 S pts/1    00:00:00 /bin/bash /home/terdon/scripts/foo.sh
13556 13555 S pts/1    00:00:00 sleep 1000
13557 13555 S pts/1    00:00:00 sleep 1000
13558 13555 S pts/1    00:00:00 sleep 1000
13559 13555 S pts/1    00:00:00 sleep 1000

So, each child process is in the process group of the parent, foo.sh. If we now kill the process group, all proceses will exit:

$ kill -9 -13555
$ ps ax -O pgid | grep -P '[s]leep|[f]oo.sh'
[1]+  Killed                  foo.sh
terdon
  • 242,166
  • Why is/gets sleep (6745) connected to ps ax -O tpgid (7136) in the second snippet? –  Aug 14 '16 at 14:27
  • 2
    @tomas it isn't. As I explained (or tried to) in the answer, the tpgid is not the parent group ID. It is simply the PID of the process that is currently running in the foreground. Since the process that was in the foreground when the ps was running is, of course, the ps command itself (PID 6745), that's what is shown as tpgid. You might want to ping me (@terdon) in /dev/chat to discuss this if you're still confused. – terdon Aug 14 '16 at 14:30
  • The definition in the first snippet suggests it is, doesn't it? "tpgid - ID of the foreground process group on the tty (terminal) that the process is connected to ...". (Other people might benefit in the future if you explain this here.) –  Aug 14 '16 at 14:38
  • @tomas no, the definition clearly states that it is the ID of the foreground process. I don't really see what else I can add to that. – terdon Aug 14 '16 at 14:39
  • I get it now. Thanks & sorry for my dumbness. –  Aug 14 '16 at 14:45
  • 1
    @tomas heh, if misunderstanding a man page meant someone is dumb, we would all be completely stupid. Everyone has failed to understand a man page at some point or another :) – terdon Aug 14 '16 at 14:46