3

I am trying to kill a du -mh command that has been stuck running for 18 hours. I have tried many kill signals kill -15, kill -2, kill -9 as root but with no luck. Are there any other techniques to kill this process?

Note this is in R running state, yet it does not appear to be responding to my kill signals.

Screenshot of top command

 ps -Z 31806
 LABEL                             PID TTY      STAT   TIME COMMAND
 unconfined                      31806 ?        RN   1137:41 du
  • You can't kill a zombie, it's already dead. I have no idea what those many SE answers are talking about. You're probably rather thinking of processes stuck in an uninterruptible sleep (D, not R state). In your case: maybe you don't have permission to kill it? Try strace -e trace=kill kill PID, and if that shows that the kill() is successful, attach to the df process with strace -p PID, kill it and see how it's reacting to the signal. –  Aug 06 '19 at 11:11
  • @mosvy strace -e trace=kill kill PID returns exit code 0, and strace -p PID shows attached, but no output following kill, kill -9 command – StackEng2010 Aug 06 '19 at 11:18
  • @mosvy you're right, I've removed the mention of zombie state from my question. – StackEng2010 Aug 06 '19 at 11:26
  • will kill -TSTP PID stop it? –  Aug 06 '19 at 11:33
  • A process can not catch a SIGKILL (signal 9) so if it doesn't help trying to send that, I can't see other possibilities than you not being allowed to send signals to the process at all. As the top output shows that it's owned by root, you need to be root to send it signals. – Henrik supports the community Aug 06 '19 at 11:33
  • @Henrik I am sending signals as root@mosvykill -TSTP PID` doesn't stop it either. – StackEng2010 Aug 06 '19 at 11:35
  • Could you attach to it with gdb -p PID? Anyways, please add the output of ps -Z PID (instead of the top screenshot, which may not be accessible to some people). Also mention if you're using any kind of virtualization/containerization. –  Aug 06 '19 at 11:50
  • The last output of gdb -p PID is "Attaching to process 31806" – StackEng2010 Aug 06 '19 at 11:58
  • 1
    Which means that it cannot attach it (strace succeeds because it uses PTRACE_SEIZE which doesn't stop the process). This may be some kind of kernel bug -- also see this old question. –  Aug 06 '19 at 12:58
  • @mosvy I think you're right, that this is some kernel bug. Will restart the server later to resolve. Thank you – StackEng2010 Aug 06 '19 at 16:05
  • Server restarted, a soft shutdown was stuck too. As this was an AWS instance, I had to stop the instance and start it back up again. Problem resolved, assumed kernel loop / bug. – StackEng2010 Aug 06 '19 at 22:23
  • 1
    do/did you have any NFS mounts on that VM? If so, see nfs(5) and read the info on the soft and hard mount options (which control timeout & retry behaviour, effectively giving you a choice between hangs like this or the risk of data corruption). I ask because I've seen hangs on df or du many times over the years when an NFS server doesn't respond to an NFS client. – cas Aug 06 '19 at 23:38
  • @cas I have no NFS mounts, but I think the du hang may be related to docker containers hosted on this server. – StackEng2010 Aug 08 '19 at 17:28

1 Answers1

-2

You could use :

kill -kill PID

This is very common command for force kill PID.