1

I own some old processes on a shared compute server. They consume a lot of CPU and according to htop they're in the running-state:

PID   USER   PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21420 <user> 20   0  0.278t  48776  34012 R  53.3  0.0  22254:28 extract_image_f

where <user> is my user name.

I tried kill -9 21420, without any effect (return code is 0). They say that kill -9 always works for running processes if I have sufficient permission. They also say it might take a while for this to be effective, however, I've waited for 4 weeks now. I'm pretty sure that I have sufficient rights to kill my own processes, though I didn't check that (how do I?). I'm not the admin of the server.

What can I do if restarting is not an option? What's going on here?

Edit: Long ago, the process used the GPU (CUDA via keras). Maybe something went wrong there? According to nvidia-smi, the GPU is not used by the process anymore.

pasbi
  • 293
  • 1
  • 7
  • Did you try kill -15 21420 – dubis Aug 22 '17 at 07:25
  • Try also kill -TERM -1 ; then tell the system administrator – Basile Starynkevitch Aug 22 '17 at 07:27
  • both commands return 0 and don't affect the status of the process. – pasbi Aug 22 '17 at 07:31
  • The admin eventually replied: "These processes are stuck in the kernel. Only a reboot can solve this". – pasbi Aug 22 '17 at 08:30
  • There are a few things that can cause a process to refuse to die, the most common of which is a loss of disk access, usualy over a NFS share or something. The kernel will halt the process and wait forever on the handle to become available again, the only solution is to reboot. – HostFission Aug 24 '17 at 05:29

0 Answers0