I have a task running that blocks pm-hibernate (on Linux 4.0.7-2). When I try pm-hibernate there is an error message "Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0):" and the task is shown.
The process is a dead one that has been killed hours before. Why can root not just remove it from the kernel? I am feeling like under Windows!
I have seen related questions like How to kill a process which can't be killed without rebooting? but there do not seem to be satisfactory answers.
Some info (31207
is the pid):
# cat /proc/31207/syscall
11 0x7fe482a47000 0x25fce 0x7fe481d4eb78 0x1 0x7fe482a6e700 0x25f2d30 0x7ffca8d8c278 0x7fe481a95ae7
# ps -l -p 31207
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 D 1001 31207 1 0 80 0 - 5035 lock_e pts/9 00:00:00 a.out
# ps -lnp 31207
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 D 1001 31207 1 0 80 0 - 5035 ffffff pts/9 0:00 /tmp/a.out
# ps opid,wchan:42,cmd -p 31207
PID WCHAN CMD
31207 lock_extent_bits /tmp/a.out
So, why can I not just stop it? To suspend it would suffice!
I am using no network FS and the task was a simple one accessing the network. If you can read this, the network is still up.
ps -l -p 31207
to see what the "WCHAN" column holds. Beyond that, the question you reference has the answer: if you can't kill a process, it's waiting on network filesystem, there;s a kernel bug, or there's a hardware problem. – Jul 09 '15 at 21:25lock_e
. What does that tell us? There is no network filesystem, no mounts from remote servers, nothing. So - a kernel bug? – Ned64 Jul 09 '15 at 21:54lock_extent_bits
which points to a possible (!) cause: https://bugzilla.kernel.org/show_bug.cgi?id=76421 – Ned64 Jul 09 '15 at 22:25kill -SEGV
pid`. – ott-- Sep 12 '15 at 21:27