120

I'm writing an application. It has the ability to spawn various external processes. When the application closes, I want any processes it has spawned to be killed.

Sounds easy enough, right? Look up my PID, and recursively walk the process tree, killing everything in sight, bottom-up style.

Except that this doesn't work. In one specific case, I spawn foo, but foo just spawns bar and then immediately exits, leaving bar running. There is now no record of the fact that bar was once part of the application's process tree. And hence, the application has no way of knowing that it should kill bar.

I'm pretty sure I can't be the first person on Earth to try to do this. So what's the standard solution? I guess really I'm looking for some way to "tag" a process in such a way that any process it spawns will unconditionally inherit the same tag.

(So far, the best I can come up with is running the application as a different user. That way, you can just indescriminently kill all processes beloning to that user. But this has all sorts of access permission problems...)

5 Answers5

107

Update

This is one of those ones where I clearly should have read the question more carefully (though seemingly this is the case with most answers on to this question). I have left the original answer intact because it gives some good information, even though it clearly misses the point of the question.

Using SID

I think the most general, robust approach here (at least for Linux) is to use SID (Session ID) rather than PPID or PGID. This is much less likely to be changed by child processes and, in the case of shell script, the setsid command can be used to start a new session. Outside of the shell the setuid system call can be used.

For a shell that is a session leader, you can kill all the other processes in the session by doing (the shell won't kill itself):

kill $(ps -s $$ -o pid=)

Note: The trailing equals sign in argument pid= removes the PID column header.

Otherwise, using system calls, call getsid for each process seems like the only way.

Using a PID namespace

This is the most robust approach, however the downsides are that it is Linux only and that it needs root privileges. Also the shell tools (if used) are very new and not widely available.

For a more detailed discussion of PID namespaces, please see this question - Reliable way to jail child processes using `nsenter:`. The basic approach here is that you can create a new PID namespace by using the CLONE_NEWPID flag with the clone system call (or via the unshare command).

When a process in a PID namespace is orphaned (ie when it parent process finishes), it is re-parented to the top level PID namespace process rather than the init. This means that you can always identify all the descendants of the top level process by walking the process tree. In the case of a shell script the PPID approach below would then reliably kill all descendants.

Further reading on PID namespaces:

Original Answer

Killing child processes

The easy way to do this in a shell script, provided pkill is available is:

pkill -P $$

This kills all children of the current given process ($$ expands to the PID of the current shell).

Killing all descendent processes

Another situation is that you may want to kill all the descendants of the current shell process as well as just the direct children. In this case you can use the recursive shell function below to list all the descendant PIDs, before passing them as arguments to kill:

list_descendants ()
{
  local children=$(ps -o pid= --ppid "$1")

for pid in $children do list_descendants "$pid" done

echo "$children" }

kill $(list_descendants $$)

Double forks

One thing to beware of, which might prevent the above from working as expected is the double fork() technique. This is commonly used when daemonising a process. As the name suggests the process that is to be started runs in the second fork of the original process. Once the process is started, the first fork then exits meaning that the process becomes orphaned.

In this case it will become a child of the init process instead of the original process that it was started from. There is no robust way to identify which process was the original parent, so if this is the case, you can't expect to be able to kill it without having some other means of identification (a PID file for example). However, if this technique has been used, you shouldn't try to kill the process without good reason.

Further Reading:

Graeme
  • 34,027
  • The descendant process needn't double fork - it can fork once, but if it does so it will be orphaned if its parent process later forks. This occurs all of the time, and is not necessarily related to intentional daemonization at all. – mikeserv Apr 10 '14 at 19:55
  • But we've all already answered how to kill all child processes in various ways - the question is not how to ensure one process becomes and remains the child of another. Can you get in testing? Besides, I already did it: http://unix.stackexchange.com/questions/124162/reliable-way-to-jail-child-processes-using-nsenter – mikeserv Apr 10 '14 at 20:49
  • @mikserv, ok, I'll have a go. If Gilles doesn't get there first :) – Graeme Apr 10 '14 at 20:55
  • This misses all cases where a child dies before the grandchild. – Gilles 'SO- stop being evil' Apr 10 '14 at 22:21
  • What is supposed to happen when you kill the process running as pid 1 via unshare -p? I had hoped that this would reliably kill all processes below it, so that you can "clean up" by killing a single process/cgroup root. But it seems that this results in all child processes to be reparented to the outer init. – nh2 Sep 19 '17 at 14:15
  • Turns out it works with unshare -pf but not with unshare -p. I've asked a separate question to find out why. – nh2 Sep 19 '17 at 14:42
  • @Graeme This is now possible with util-linux 2.32. Using unshare -fp --kill-child -- yourprogram, if you kill unshare, all child processes will be killed. I implemented this upstream. Would you mind updating your answer? – nh2 Jun 17 '18 at 02:58
  • pkill -P $$ worked on my workstation but not on travis CI -- for whatever reason it kills the shell's process as well. I also did not have luck with kill $(ps...). Yet another way to do this is: jobs -p | xargs -n 1 pkill -P. (This enumerates the pids of the child jobs and passes each to pkill -P.) – Doug Fawley Oct 21 '19 at 17:23
  • Be careful copying $(ps -p $$ -o ssid=) as ps may sometimes add leading whitespace to the result. To remove all whitespace, use $(ps -p $$ --no-headers -o ssid:1). See here. Doesn't affect this answer but important if you want to store SID into a variable. – szmoore Dec 24 '20 at 01:57
  • The answer states that "Using a [Linux] PID namspace ... needs root privileges." This is incorrect. On appropriately compiled recent Linux kernels, a non-root user can create a user namespace, and a user namespace can include a PID namespace. See man 7 user_namespaces. And possibly also man 1 unshare. – mpb Jan 13 '21 at 20:00
  • How do you get the SID of a process? Man pages talk about getsid, but it's not available in Arch Linux 5.11.2-arch1-1 or Ubuntu 20.10. Tried searching for getsid online, but only found a few man pages including from Ubuntu. My polybar isn't cleanly closing through autorandr and all the child processes have different PPID and PGRP, but has the same SID. – gavsiu Feb 28 '21 at 18:44
  • @gavsiu it should show up with ps -j or ps -o sess or ps -o sid. – Graeme Feb 28 '21 at 22:10
  • Thanks. Those commands do work as intended, but killing using SID is not the solution to my problem because apparently it's not just polybar using the same SID. Back to the drawing board. – gavsiu Mar 01 '21 at 03:04
  • ps --ppid does not appear to be specified by POSIX, unless I'm missing something? – Adrian Günter May 29 '23 at 03:57
  • 1
    @AdrianGünter, well spotted. Took that part out of the answer – Graeme Jun 14 '23 at 16:13
  • @Graeme that's unfortunate, as I was specifically on the hunt for a POSIX-compatible method. :P Unfortunately it doesn't seem to be easily doable, although I remember taking the pains once a few years back and arriving at a workable solution. Alas... – Adrian Günter Jun 16 '23 at 02:01
  • 1
    @AdrianGünter, you can't just do something along the lines of ps -o ppid -o pid | grep "^$$" | cut -f 2 ? – Graeme Jun 16 '23 at 11:48
  • @Graeme Perhaps, I'll have to give it a try! – Adrian Günter Jun 19 '23 at 23:37
30

You can use:

kill -TERM -- -XXX

where XXX is group number of process group you want to kill. You can check it using:

 $ ps x -o  "%p %r %c"
 PID   PGID COMMAND
 2416  1272 gnome-keyring-d
 2427  2427 gnome-session
 2459  2427 lightdm-session <defunct>
 2467  2467 ssh-agent
 2470  2427 dbus-launch
 2471  2471 dbus-daemon
 2484  2427 gnome-settings-
 2489  2471 gvfsd
 2491  2471 gvfs-fuse-daemo
 2499  2427 compiz
 2502  2471 gconfd-2
 2508  2427 syndaemon
 2513  2512 pulseaudio
 2517  2512 gconf-helper
 2519  2471 gvfsd-metadata

For more details about process groups ID, you can see man setpgid:

DESCRIPTION
       All  of  these interfaces are available on Linux, and are used for get‐
       ting and setting the process group ID (PGID) of a  process.   The  pre‐
       ferred,  POSIX.1-specified  ways  of doing this are: getpgrp(void), for
       retrieving the calling process's PGID; and  setpgid(),  for  setting  a
       process's PGID.

       setpgid()  sets  the  PGID of the process specified by pid to pgid.  If
       pid is zero, then the process ID of the calling process  is  used.   If
       pgid is zero, then the PGID of the process specified by pid is made the
       same as its process ID.  If setpgid() is used to move  a  process  from
       one  process  group to another (as is done by some shells when creating
       pipelines), both process groups must be part of the same  session  (see
       setsid(2)  and  credentials(7)).   In  this case, the pgid specifies an
       existing process group to be joined and the session ID  of  that  group
       must match the session ID of the joining process.
cuonglm
  • 153,898
  • 1
    How do process groups work? Most of the tasks in your example seem to be in one of two groups... – MathematicalOrchid Apr 10 '14 at 15:58
  • It's a result of forking from shell command line or server startup. It's only short example. – cuonglm Apr 10 '14 at 16:01
  • 1
    The process group looks like it might be the thing I'm looking for, but I still want to understand exactly when two processes will have the same group ID, and when it will be different. – MathematicalOrchid Apr 10 '14 at 16:14
  • Updated my answer. – cuonglm Apr 10 '14 at 16:34
  • @MathematicalOrchid You should probably do man nsenter and have a look here: http://lwn.net/Articles/531114/ – mikeserv Apr 10 '14 at 18:18
  • 1
    PGID is not a reliable way to identify children of a particular process since setpgid() can be called on any process. PPID is a more robust way since this is more difficult to alter. – Graeme Apr 10 '14 at 19:15
  • @Graeme - PPID is pretty easy to alter, and often changes accidentally - for instance a parent process execs itself incrementally with new input to avoid argument length issues. If it does not do a very thorough job of cleaning up after itself - which it needn't do for this very reason - all of its children are dumped on init, which will clean them up eventually. namespaces are simple, secure ways to avoid that kind of problem. – mikeserv Apr 10 '14 at 19:32
21

If you know the parent processes PID you can do this using pkill.

Example

$ pkill -TERM -P 27888

Where the PPID is 27888.

excerpt from pkill man

   -P, --parent ppid,...
          Only match processes whose parent process ID is listed.

What's my PID in a script?

This is probably your next question so when in a Bash script you can find out the script's PID using $$ at the top.

Example

Say I have this script:

$ more somescript.bash 
#!/bin/bash

echo "top: $$"
sleep 5
echo "bottom: $$"

Now I run it, backgrounded:

$ ./somescript.bash &
[2] 28007
top: 28007

Peeking at it with pgrep shows we've got the right PID:

$ pgrep somescript.bash
28007
$ bottom: 28007

[2]+  Done                    ./somescript.bash

Using a process' PGID

If you use this ps command you can find out a processes PGID, which you can kill using instead.

Using now this script, killies.bash:

$ more killies.bash 
#!/bin/bash

sleep 1000 &
sleep 1000 &
sleep 1000 &

sleep 100

We run it like so:

$ killies.bash &

Checking in on it:

$ ps x -o  "%p %r %c"
  PID  PGID COMMAND
28367 28367 killies.bash
28368 28367 sleep
28369 28367 sleep
28370 28367 sleep
28371 28367 sleep

Now we kill the PGID:

$ pkill -TERM -g 28367
[1]+  Terminated              ./killies.bash

Additional methods

If you take a look at this SO Q&A you'll find still more methods for doing what you want:

References

slm
  • 369,824
  • 4
  • Does this kill only immediate children, or all descendents? 2. Does this address the issue where one of the intermediate descentends exits, breaking the link in the process tree?
  • – MathematicalOrchid Apr 10 '14 at 15:58
  • This kills every process that's parent is your script. – slm Apr 10 '14 at 15:59
  • This worked for me. This terminates the parent process and all its child processes. – Rajaraman Subramanian Oct 25 '21 at 09:11