What if 'kill -9' does not work?

Question

I have a process I can't kill with kill -9 <pid>. What's the problem in such a case, especially since I am the owner of that process. I thought nothing could evade that kill option.

Stuck in a system call into the kernel, cannot be interrupted. I have a problem with Xorg going into that state, likely because of a GPU driver bug. Trying to attach any debugger (gdb, strace) just causes the debugger to hang as well (but at least they still respond to SIGKILL). Bad kernel design (blocking uninterruptible system calls) combined with buggy code here and there and you have a deadlock. Usually only a reboot can fix it, and that certainly seems to be the case with my Xorg problem as well. — Tronic, Jul 01 '21 at 15:17
If anyone is still designing new operating systems, avoid blocking system calls at all cost and go full async. You should have syscalls to initiate operations but that return immediately to user space, and then a poll syscall for the process to sleep until some response arrives (a single syscall for polling also allows waiting for many things simultaneously). Then the only syscall that needs to be interruptible is poll, instead of the zillion I/O operations that different drivers and subsystems implement. — Tronic, Jul 01 '21 at 15:24

Gilles 'SO- stop being evil' · Accepted Answer · 2023-02-07T20:02:12.380

719

kill -9 (SIGKILL) always works, provided you have the permission to kill the process. Basically either the process must be started by you and not be setuid or setgid, or you must be root. There is one exception: even root cannot send a fatal signal to PID 1 (the init process).

However kill -9 is not guaranteed to work immediately. All signals, including SIGKILL, are delivered asynchronously: the kernel may take its time to deliver them. Usually, delivering a signal takes at most a few microseconds, just the time it takes for the target to get a time slice. However, if the target has blocked the signal, the signal will be queued until the target unblocks it.

Normally, processes cannot block SIGKILL. But kernel code can, and processes execute kernel code when they call system calls. Kernel code blocks all signals when interrupting the system call would result in a badly formed data structure somewhere in the kernel, or more generally in some kernel invariant being violated. So if (due to a bug or misdesign) a system call blocks indefinitely, there may effectively be no way to kill the process. (But the process will be killed if it ever completes the system call.)

A process blocked in a system call is in uninterruptible sleep. The ps or top command will (on most unices) show it in state D (originally for “disk”, I think).

A classical case of long uninterruptible sleep is processes accessing files over NFS when the server is not responding; modern implementations tend not to impose uninterruptible sleep (e.g. under Linux, since kernel 2.6.25, SIGKILL does interrupt processes blocked on an NFS access).

If a process remains in uninterruptible sleep for a long time, you can get information about what it's doing by attaching a debugger to it, by running a diagnostic tool such as strace or dtrace (or similar tools, depending on your unix flavor), or with other diagnostic mechanisms such as /proc/PID/syscall under Linux. See Can't kill wget process with `kill -9` for more discussion of how to investigate a process in uninterruptible sleep.

You may sometimes see entries marked Z (or H under Linux, I don't know what the distinction is) in the ps or top output. These are technically not processes, they are zombie processes, which are nothing more than an entry in the process table, kept around so that the parent process can be notified of the death of its child. They will go away when the parent process pays attention (or dies).

edited Feb 07 '23 at 20:02

answered Jan 10 '11 at 20:08

Gilles 'SO- stop being evil'

829,060

132

Yor reply looks self contradicting. You start telling SIGKILL always works but end citing the uninterruptible sleep case, where SIGKILL might never work outside shutting down the kernel. There are also two cases where SIGKILL doesn't work. With zombies obviously as you can't kill already dead processes and with init, which by design is ignoring SIGKILL signals. – jlliagre Jan 11 '11 at 12:27
53

@jlliagre: Killing a zombie doesn't make sense, it's not alive to begin with. And killing a process in interruptible sleep does work, it's just (as with other signals) asynchronous. I've tried to clarify this in my edit. – Gilles 'SO- stop being evil' Jan 11 '11 at 20:07
5

I wrote too killing a zombie doesn't make sense but that doesn't prevent many people to try it and complain. Killing a process in interruptible sleep indeed works by design, but I was talking about killing a process in uninterruptible sleep which can fail if the system call never wake up. – jlliagre Jan 11 '11 at 21:39
16

man 5 nfs: "The intr/nointr mount option is deprecated after kernel 2.6.25. Only SIGKILL can interrupt a pending NFS operation on these kernels, and if specified, this mount option is ignored to provide backwards compatibility with older kernels." – Martin Schröder Aug 07 '12 at 20:05
3

I had problems killing an ls process accessing an sshfs mount, when the remote server has beome unreachable. Is there a mount option for FUSE or sshfs, which I could use in future to avoid such situations? 2.6.30 kernel – imz -- Ivan Zakharyaschev Mar 28 '13 at 14:55
7

@imz--IvanZakharyaschev Not that I know of (but I might not know). With sshfs, as a last resort, you can kill the sshfs process (and likewise with any other FUSE filesystem: you can always force-unmount this way). – Gilles 'SO- stop being evil' Mar 28 '13 at 19:42
1

@Gilles Thanks, your advice (what to do with any FUSE fs) helped. I haven't been able to get rid of those hanging sshfs mounts for months, and some GUIs listing the filesystem tree were unusable simply because there were such mountpoints in my home directory (Thunar, "open file" and "save as" dialogs in most programs). Now I was able to simply kill the sshfs processes, and everything is fine again! So, in a sense, user-space FSs are superior to kernel-space FSs in the sense of the usability/convenience fo the system for the user! – imz -- Ivan Zakharyaschev Mar 30 '13 at 12:34
2

@imz--IvanZakharyaschev: Heh, users of microkernels have known this convenience for a long time. Disk totally stuck? Kill and respawn disk server. NFS stuck? Kill and respawn nfs daemon. Since everything is a process, it is very hard to really hang a microkernel OS. – nneonneo Aug 10 '13 at 02:06
"kill -9 (SIGKILL) always works, provided you have the permission..." - I'm not sure that's correct. It depends on the process state in the kernel. There's a couple of states the process can be in such that it can't be killed. I wish I could find the post that discusses it.... – Mar 13 '16 at 06:20
2

@jww There are process states where the process can't be killed, but the signal is queued up and can't be cancelled. As I explain, SIGKILL always works, but it doesn't always work immediately. – Gilles 'SO- stop being evil' Mar 13 '16 at 18:48
Something that worked for me is for i in $(seq 1 1000); do sudo kill -9 <pid>; done;. I just did it out of frustration but it actually worked. – GreenRaccoon23 Jun 15 '16 at 04:35
2

@GreenRaccoon23 What worked wasn't sending the signal multiple times, sending it once would have had exactly the same effect. What worked was waiting long enough for the signal to be processed. – Gilles 'SO- stop being evil' Jun 15 '16 at 11:03
@Tshepang .. lot of options here. If you know the port.. kill $(lsof -t -i:) Or else pkill .. or if nothing else works you should go ahead with killing tty :) – Shashank Vyas Dec 22 '16 at 17:05
as indicated here, the uninterruptable sleep may be triggered by unmounting/remounting/reconnecting the media/link: https://www.redhat.com/archives/rhl-list/2004-January/msg04543.html
Unplugging a USB device (appearing as a serial port and mass-storage) fixed my issue of sublime-text hanging
– nmz787 Mar 18 '17 at 07:24
Any chance that "killing the kernel" (rebooting?) during this blockage could permanently mess something up on the bootup disk or in a peripheral? – sudo Sep 08 '17 at 00:24
2

@sudo Rebooting won't increase the likelyhood of messing something up. It isn't mathematically impossible, but the reboot wouldn't be the cause of the mess. If the reason for the effectively unkillable process is a buggy driver or hardware, then the buggy driver or hardware could cause a mess, but rebooting won't make it worse. – Gilles 'SO- stop being evil' Sep 08 '17 at 17:13
1

The process can also be blocked if it was accessing sshfs – Temak Sep 14 '17 at 20:24
We recently had a case where a daemon using the bluez API would stop responding to the KILL signal. – Ayberk Özgür Jun 27 '18 at 12:47
@AyberkÖzgür That's presumably a bug in a Bluetooth driver. – Gilles 'SO- stop being evil' Jun 27 '18 at 14:36
@Gilles I suspect the same. – Ayberk Özgür Jun 27 '18 at 14:53
@Gilles zombie processes can consume resources. I was in the python debugger and the python program started another process. I hit "ctrl" + 'c' which left a zombie that used 1/2 of a CPU. I owned it, but kill -9 didn't kill it, ever. Exiting the debugger killed it though. – VectorVortec Sep 22 '18 at 23:37
@VectorVortec If it used CPU time, it wasn't a zombie. – Gilles 'SO- stop being evil' Sep 23 '18 at 18:57
@Gilles I agree "Killing a zombie doesn't make sense" --- but "what if zombie come to life" :-)))))) Happy Christmas – Bruno Dec 24 '18 at 04:09
Kill -9 ALWAYS works Except when it doesn't. How do you kill a zombie process? I have a FM that's hung, and I can't open new instances of it, but I can't kill it either. In a case like this, what options do you have other than restarting? – Douglas Gaskell Jan 21 '19 at 07:26
1

@DouglasGaskell See the last paragraph of my answer. A zombie process is already dead. If you can't open new instances, it isn't because of the zombie. It may be because the program died without deleting some lock file, or because of some completely different bug in the program. – Gilles 'SO- stop being evil' Jan 21 '19 at 08:23
I had a process in the state of waiting for a system call, and eventually it exited, but it was in a Z, not D state...thanks for the tip! – rogerdpack Apr 30 '19 at 15:05
The idea is there (sort of) that a process being suspended while executing kernel code cannot be killed, but the devil is in the details. Where shall I start? – Eric Mar 23 '20 at 21:17
1

It is 2020, June, and under Ubuntu 19.10, I have a process (gVim) hanged on gvfs (webdav connection) for ever (timeout looks like ∞) in 'D' state. It is sad to see that even in this modern age, there is no mean to kill that process (and its associated X window on the screen) and the timeout is not well implemented. – Hans Deragon Jun 06 '20 at 13:26
@HansDeragon If the filesystem is provided via FUSE, killing the process that provides the filesystem (not the process that's hung waiting for the filesystem) will unblock the system call. – Gilles 'SO- stop being evil' Jun 06 '20 at 19:11
So why is my computer stuck before rebooting after "sending sigkill to process htop... Waiting for process: htop" (which was stuck before I tried to reboot) – Jean-Michaël Celerier Sep 07 '21 at 09:02
@Jean-MichaëlCelerier Either a kernel bug has corrupted memory, or a hardware failure has corrupted memory. – Gilles 'SO- stop being evil' Sep 07 '21 at 09:55
so kill -9 only works in the non-buggy case... but in general when you want to kill -9 something it's because there's a bug somewhere ... – Jean-Michaël Celerier Sep 08 '21 at 13:46
@Jean-MichaëlCelerier kill -9 is guaranteed to work unless there's a kernel or hardware bug. The usual reason to use it is when there's an application bug, and in that case it's guaranteed. – Gilles 'SO- stop being evil' Sep 08 '21 at 13:52
1

I have a bash command running under os.system() in a python process. Killing the process doesn't seem to kill the bash command running under os.system() inside that python file. – hafiz031 Sep 29 '22 at 05:55

score 127 · Answer 2 · edited May 29 '12 at 21:14

127

Sometime process exists and cannot be killed due to:

being zombie. I.e. process which parent did not read the exit status. Such process does not consume any resources except PID entry. In top it is signaled Z
erroneous uninterruptible sleep. It should not happen but with a combination of buggy kernel code and/or buggy hardware it sometime does. The only method is to reboot or wait. In top it is signaled by D.

edited May 29 '12 at 21:14

Kevin

40,767

answered Jan 10 '11 at 20:03

Maja Piechotka

16,676

5

Zombie doesn't consume resource ? – Luc M Jan 11 '11 at 04:17
8

@Luc M: AFAIK no (at least on Linux) - with exception of the entry in process table (i.e. PID along with such information as owner, exit status etc.). It is just process which wait acknowledgement from partent that it terminated. – Maja Piechotka Jan 11 '11 at 05:16
It's init's job to clean up Zombie's. – xenoterracide Jan 11 '11 at 09:46
20

@xenoterracide: Eventually yes but if parent process still lives (for example it is gnome-session or something which fullfill similar role) you still may have zombies. Technically it is parent job to clean up but if zombie is orphaned init cleans after it (terminology is the reason why the unix classes are done with closed doors - anyone hearing about orphans, zombies and killing in one sentence may have got wrong impressions). – Maja Piechotka Jan 11 '11 at 10:13
9

"...only method is to reboot or wait. " Wait how long? Five months have gone by and my zombies are still there. – DarenW Apr 14 '15 at 04:10
3

@DarenW until the parent acknowledges the death of children. For details please ask the author of the program. – Maja Piechotka Jan 16 '16 at 05:35
The second case is exactly what happened to me: broken hard disk controller caused Midnight Commander to kind of die and become unkillable. – hayavuk Aug 09 '16 at 08:57
There are processes which are hard to kill (having parents or still writing data to the disk). Sometimes you are even unable to kill parent and need to wait until process finishes and writes its own data to the disk. So unfortunately, but 'sudo reboot" resolves these situations. Of course you can wait for the process to shutdown, but you never know if it is a matter of seconds, minutes or years. And that's all assuming your hardware is perfectly fine... – Bart Dec 27 '22 at 00:19

Josh · Answer 3 · 2013-09-23T15:38:23.107

38

It sounds like you might have a zombie process. This is harmless: the only resource a zombie process consumes is an entry in the process table. It will go away when the parent process dies or reacts to the death of its child.

You can see if the process is a zombie by using top or the following command:

ps aux | awk '$8=="Z" {print $2}'

edited Sep 23 '13 at 15:38

answered Jan 10 '11 at 20:02

Josh

8,449

18

Umm, I always dislike this kind of "hard" field names with ps. Who can be sure that the required field will always be the 8th, with all implementations of ps in all Unices? – syntaxerror Feb 07 '15 at 17:15
1

POSIX does not define stat field for ps so there's no way to write a command that works with all Unices. The closest you can do is ps -o pid,stat | awk '/Z/ {print $1}' – Mikko Rantalainen May 14 '20 at 18:09

score 26 · Answer 4 · answered Jan 11 '11 at 14:41

26

Check your /var/log/kern.log and /var/log/dmesg (or equivalents) for any clues. In my experience this has happened to me only when an NFS mount's network connection has suddenly dropped or a device driver crashed. Could happen if a hard drive crashes as well, I believe.

You can use lsof to see what device files the process has open.

answered Jan 11 '11 at 14:41

LawrenceC

10,992

6

+1 for mention of NFS. A few years back this happened to me every couple of months-- if the NFS server crashed, NFS clients on all (patched) RHEL boxes would hang. kill -9 usually didn't work, even after waiting 60 minutes. The only solution was to reboot. – Stefan Lasiewski Jan 11 '11 at 17:02

score 17 · Answer 5 · edited Apr 13 '17 at 12:36

17

If @Maciej's and @Gilles's answer's don't solve your problem, and you don't recognize the process (and asking what it is with your distro doesn't turn up answers ). Check for Rootkit's and any other signs that you've been owned. A rootkit is more than capable of preventing you from killing the process. In fact many are capable of preventing you from seeing them. But if they forget to modify 1 small program they might be spotted ( e.g. they modified top, but not htop ). Most likely this is not the case but better safe than sorry.

edited Apr 13 '17 at 12:36

Community

1

answered Jan 11 '11 at 09:57

xenoterracide

59,188
74
187
252

I guess many rootkits inserts themselves into kernel to make things simpler (no need guessing what user have and downloading MBs of patched programs). However it is still worth checking (++vote). – Maja Piechotka Jan 12 '11 at 22:12

lepe · Answer 6 · 2018-12-04T03:18:44.107

16

First, check if its a Zombie process (which is very possible):

ps -Al

You will see something like:

0 Z  1000 24589     1  0  80   0 -     0 exit   ?        00:00:00 soffice.bin <defunct>

(Note the "Z" on the left)

If the 5th column is not 1, then it means it has a parent process. Try killing that parent process id.

If its PPID = 1, DON'T KILL IT!!, think which other devices or processes may be related to it.

For example, if you were using a mounted device or samba, try to unmount it. That may release the Zombie process.

NOTE: If ps -Al (or top) shows a "D" instead of "Z", it could be related to remote mount (like NFS). In my experience, rebooting is the only way to go there, but you may check the other answers which cover that case in more detail.

edited Dec 04 '18 at 03:18

answered Oct 08 '13 at 08:32

lepe

391

1

Sending SIGCHLD to the parent process may cause the parent to recognize the process has died. This should work even when the PPID = 1. This is normally sent by the kernel, but can be sent with to the parent via kill as well (kill -17 on Linux, check the manpages on other *nix). This usage of kill will not actually "kill" the parent, but rather (re)informs it that a child has died and needs to be cleaned up. Note that sigchld has to be sent to the parent of the zombie, not the zombie itself. – Stephanie Jan 21 '14 at 11:21
Even if PPID is not 1, it's still worth checking what the parent process is and what other processes it's ancestral to, with ps -ef | grep N where N is the PPID in question. This is a good idea in general, more so since the introduction of subreapers (back in 2012), and most of all if you're using a distribution that uses systemd. – Chris Henry Mar 05 '20 at 16:53

DeveloperChris · Answer 7 · 2016-02-10T02:38:35.720

Kill actually means send a signal. there are multiple signals you can send. kill -9 is a special signal.

When sending a signal the application deals with it. if not the kernel deals with it. so you can trap a signal in your application.

But I said kill -9 was special. It is special in that the application doesn't get it. it goes straight to the kernel which then truly kills the application at the first possible opportunity. in other words kills it dead

kill -15 sends the signal SIGTERM which stands for SIGNAL TERMINATE in other words tells the application to quit. This is the friendly way to tell an application it is time to shutdown. but if the application is not responding kill -9 will kill it.

if kill -9 doesn't work it probably means your kernel is out of whack. a reboot is in order. I can't recall that ever happening.

15 is SIGTERM (friendly kill), not SIGHUP. SIGHUP is for the controlling terminal being closed or the communication channel being lost — JoelFan, Jan 11 '11 at 05:53

jlliagre · Answer 8 · 2018-08-28T09:11:07.723

10

The init process is immune to SIGKILL.

This is also true also for kernel threads, i.e. "processes" with a PPID equal to 0.

edited Aug 28 '18 at 09:11

answered Jan 11 '11 at 21:42

jlliagre

61,204

1

Kernel tasks can also be immune to SIGKILL. This happens often enough with Btrfs. – Tobu Feb 28 '13 at 10:37

score 10 · Answer 9 · 2013-03-28T05:58:55.293

10

As others have mentioned, a process in uninterruptible sleep cannot be killed immediately (or, in some cases, at all). It's worth noting that another process state, TASK_KILLABLE, was added to solve this problem in certain scenarios, particularly the common case where the process is waiting on NFS. See http://lwn.net/Articles/288056/

Unfortunately I don't believe this is used anywhere in the kernel but NFS.

edited Mar 28 '13 at 05:58

answered Mar 28 '13 at 05:51

I had problems killing an ls process accessing an sshfs mount, when the remote server has beome unreachable. Is there a solution for FUSE or sshfs, which I could use in future to avoid such situations? 2.6.30 kernel – imz -- Ivan Zakharyaschev Mar 28 '13 at 14:57
@imz An advice from Gilles (to kill sshfs) is there -- http://unix.stackexchange.com/a/5648/4319 . – imz -- Ivan Zakharyaschev Mar 30 '13 at 12:36

score 6 · Answer 10 · edited Dec 04 '18 at 06:11

Made a little script that helped me a lot take a look!

You can use it to kill any process with a given name in its path(pay attention to this!!) Or you can kill any process of a given user using the "-u username" parameter.

#!/bin/bash

if [ "$1" == "-u" ] ; then\n
        PID=`grep "$2" /etc/passwd | cut -d ":" -f3`
        processes=`ps aux | grep "$PID" | egrep -v "PID|ps \-au|killbyname|grep" | awk '{ print $2}'`
        echo "############# Killing all processes of user: $2 ############################"
else
        echo "############# Killing processes by name: $1 ############################"
        processes=`ps aux | grep "$1" | egrep -v "killbyname|grep" | awk '{ print $2}' `
fi


for process in $processes ; do
        # "command" stores the entire commandline of the process that will be killed
        #it may be useful to show it but in some cases it is counter-productive
        #command=`ps aux | grep $process | egrep -v "grep" | awk '{ print $2 }'`
        echo "Killing process: $process"
        echo ""
        kill -9 $process
done

Instead of just linking to it, can you instead post the code here. — tshepang, Mar 27 '13 at 20:53
Add a bit of description with (or at least instead) of the code... — vonbrand, Mar 27 '13 at 21:23
Yup but the "$name" is more aggregating... it will kill any process with "$name" in its running path. Can be very useful whan you have these huge command lines and you don't know what the process name is. — user36035, Apr 01 '13 at 17:43

score 5 · Answer 11 · answered Mar 18 '17 at 07:30

5

from here originally:

check if strace shows anything

strace -p <PID>

try attaching to the process with gdb

gdb <path to binary> <PID>

if the process was interacting with a device that you can unmount, remove the kernel module for, or physically disconnect/unplug... then try that.

answered Mar 18 '17 at 07:30

nmz787

190

Worked for me! (unplugging the USB device, which was hanging sublime-text) – nmz787 Mar 19 '17 at 18:18

score 5 · Answer 12 · answered May 29 '18 at 13:16

I had kind of this issue. This was a program that I had launched with strace and interrupted with Ctrl+C. It ended up in a T (traced or stopped) state. I don't know how it happened exactly, but it was not killable with SIGKILL.

Long story short, I succeeded in killing it with gdb:

gdb -p <PID>
> kill
Kill the program being debugged? (y or n) y
> quit

score 4 · Answer 13 · edited May 29 '12 at 21:15

4

There are cases where even if you send a kill -9 to a process, that pid will stop, but the process restarts automatically (for instance, if you try it with gnome-panel, it will restart): could that be the case here?

edited May 29 '12 at 21:15

Kevin

40,767

answered Jan 11 '11 at 23:16

dag729

241

8

When something like this happens, the PID actually changes. So I would have noticed. – tshepang Jan 11 '11 at 23:19

score 1 · Answer 14 · answered Aug 14 '22 at 22:26

I have this more frequently with not well-behaved FUSE filesystems. Those processes cannot be killed and suspend will not work anymore because those processes also cannot be frozen: Freezing of tasks failed after 20 seconds (2 tasks refusing to freeze, wq_busy=0):.

Sometimes the process hangs because of a faulty file system. In that case, try forced umount:

sudo umount -f /path-to-problematic-mount-like-sshfs

I already tried to lazily unmount it and had lost my mount point. The solution that worked for me was to use the FUSE control file system to abort the problematic connections. Quote from the link:

waiting The number of requests which are waiting to be transferred to userspace or being processed by the filesystem daemon. If there is no filesystem activity and ‘waiting’ is non-zero, then the filesystem is hung or deadlocked.

abort Writing anything into this file will abort the filesystem connection. This means that all waiting requests will be aborted an error returned for all aborted and new requests.

Finding the correct connection is cumbersome. In my case it was easy because all other FUSE file systems had no activity and one of the FUSE connections had 2 waiting requests no matter how often I polled. I aborted that connection and after that the process exited as desired.

for fuseConnection in /sys/fs/fuse/connections/*/; do
    waiting="$( cat -- "$fuseConnection/waiting" &> /dev/null )"
    if [ -n "$waiting" ] && [ "$waiting" != 0 ]; then 
       echo "$fuseConnection has waiting requests."
    fi
done

If you are sure that you got the correct connection, then you can abort it by writing anything to the special abort file. Note that this might interrupt file transfers and such when done on the wrong connection. I'm not yet aware of a better method to find out which connection belongs to which mount.

echo 1 > /sys/fs/fuse/connections/1234567890/abort

rogerdpack · Answer 15 · 2019-07-08T19:07:53.607

Based on a clue from gilles' answer, I had a process marked "Z" in top (<defunct> in ps) that was using system resources, it even had a port open that was LISTEN'ing and you could connect to that port. This was after executing a kill -9 on it. Its parent was "1" (i.e. init) so theoretically it should just be repeaed and disappear. But it wasn't, it was sticking around, though not running, and "not dying"

So in my case it was zombie but still consuming resources...FWIW.

And it was not killable by any number of kill -9's

And its parent was init but it wasn't being reaped (cleaned up). I.e. init had a zombie child.

And reboot was not necessary to fix the problem. Though a reboot "would have worked" around the problem/made it faster shutdown. Just not graceful, which was still possible.

And it was a LISTEN port owned by a zombie process (and a few other ports too like CLOSE_WAIT status connected localhost to localhost). And it still even accepted connections. Even as a zombie. I guess it hadn't gotten around to cleanup up the ports yet so incoming connections were still added to the tcp listening port's backlog, though they had no chance of being accepted.

Many of the above are stated as "impossible" on various places in the interwebs.

Turns out that I had an internal thread within it that was executing a "system call" (ioctl in this instance) that was taking a few hours to return (this was expected behavior). Apparently the system cannot kill the process "all the way" until it returns from the ioctl call, guess it enters kernel land. After a few hours it returned, things cleared up and the sockets were all automatically closed, etc. as expected. That's some languishing time on death row! The kernel was patiently waiting to kill it.

So to answer the OP, sometimes you have to wait. A long time. Then the kill will finally take.

Also check dmesg to see if there was a kernel panic (i.e. kernel bug).

This seems to be you describing your own specific scenario rather than an answer to the question. In your case the process fixed itself on its own because of a long running operation, something not mentioned in the question. You are welcome however to raise a new question and provide the answer to it as well. Though I fear that question might get closed as "not reproducible", since the result is specific to your implementation. — Centimane, Jul 08 '19 at 18:54
True, I added how it answers OP, since it...could, in some cases. — rogerdpack, Jul 08 '19 at 19:08

What if 'kill -9' does not work?

15 Answers15

Linked

Related