9

An apache2 process got stuck on my server and caused problems with other services. (original problem: kerneloops after USB hardware disconnect)

root@server:~# ps aux | grep apache2 | grep -v grep
www-data 12917  0.0  0.1 412148 16156 ?        D    Jun27   0:00 /usr/sbin/apache2 -k start

Naturally, I killd it. it's still alive. so I kill -9d it. It's still "alive".

Now this is where the question gets serverfault/unix&linux-worthy: Is there a way to get port 443 back without doing the obvious thing: rebooting? Iptables is installed.

Update: I could not resolve the Problem withouth rebooting. The general approach (use lsof or /proc/$PID/fd to find out which and get rid of that drive) as described here and in the "duplicate" could well have worked if not for additional (probably) hardware defects.

  • It probably is stuck on some network file system – Basile Starynkevitch Jun 29 '15 at 17:48
  • @Basile, mount says that beside tmpfs/proc/sysfs there is only one other, and both / and the storage are local & responsive. i am pretty sure it was about a (already disconnected) usb thumb drive. –  Jun 29 '15 at 17:54
  • Do you have out-of-band (i.e. non-SSH) access? ifdown/ifup / restart networking? – Andrew Jun 30 '15 at 07:36
  • You may want to do : lsof -p pid to see what the process is accessing right now, it could help you to find out the bottleneck. If not, you can also attach to it with a debugger (ex: Gdb, gnu debugger), and see exactly what it is currently doing. – Olivier Dulac Jun 30 '15 at 10:40
  • 1
    On Linux you can check the opened and mapped files here: /proc/$PID/fd/ and /proc/$PID/maps. – pabouk - Ukraine stay strong Jun 30 '15 at 13:05
  • @olivier it hung up in a disk writing related syscall, thats what it did. @pabouk since lsof hung up aswell, /proc/$PID/fd/ was very helpful. i cannot reproduce the exact situation now any more tho. –  Jun 30 '15 at 13:24

1 Answers1

25

The "D" state is unkillable. A process can only be killed when it's in user space (its code is doing whatever is doing). When a system call is called (most commonly the issue are input-output operations), the kernel takes over until the system call returns. While in kernel mode, the process cannot be killed. Aborting kernel code is dangerous for the entire system, and additionally, it's also a philosophical question if a process in kernel mode should listen to signals given that it's not in control.

So the only way out of the kernel mode is a time-out/abort from the code itself. I/O operations on network drives are usually overprotective and don't want to give up to avoid data loss. If your network drive is unreachable (or any other I/O, device access/... fails), then your process can wait almost indefinitely in the "disk zombie" state.

If the offending drive is forcibly unmounted, the processes usually die.

orion
  • 12,502
  • dropping the offending drive sounds reasonable, allthough, in my case was not successful. i dismounted the only remaining non-root drive - and umount itself ended up D/unkillable hungup in call_rwsem_down_write_failed aswell. –  Jun 30 '15 at 00:46
  • Did you use the force? umount -f should not hang. – orion Jun 30 '15 at 12:56