1

we have kafka machines in hadoop cluster

the script that stop the kafka process do the following

kill PID

but we notice that the script that stop the kafka not really kill the process

therefore we killed it ( manually ) by:

kill -9 PID

so - in which cases process insist to killed by -9 ( instead just kill pid )

example from the script

function kafkaKill {
   local localPID=$1
   kill $localPID || return 1
   for ((i=0; i<MAX_WAIT_TIME; i++)); do
      kafkaIsRunning $localPID
      if [ $? -eq 0 ]; then return 0; fi
      sleep 1
   done

   kill -s KILL $localPID || return 1
   for ((i=0; i<MAX_WAIT_TIME; i++)); do
      kafkaIsRunning $localPID
      if [ $? -eq 0 ]; then return 0; fi
      sleep 1
   done

   return 1
}
yael
  • 13,106
  • This might help you out (SIGTERM is what kill sends by default): https://unix.stackexchange.com/questions/195998/why-wont-this-process-die-after-sigterm – Panki Jan 07 '19 at 15:27
  • I prefer to get clear answer , if you think your answer fit for my question then please post it – yael Jan 07 '19 at 15:30

1 Answers1

2

Sending a standard kill to a process sends (according to wikipedia) a SIGTERM by default. What this does is notify the process that it should shut down. This is the nice way of dealing with the process, and it goes like this:

  • Process registers signal handler for SIGTERM
  • You want to kill the process
  • You send SIGTERM through kill
  • The signal handler is called, this is the chance for the process to
    • Close files that it has open
    • Write out any buffers
    • Shut down any child threads

There's nothing in sending a SIGTERM that forces a process to exit. It can completely ignore it or it can behave however it wants.

Kill -9 sends SIGKILL. You're not allowed to register a handler for SIGKILL, which means the default is called (kernel space I believe - someone correct me here). In this case, you don't have a chance to do the above, your process is immediately removed from the runnable process list and its memory and everything are destroyed. This can clearly cause issues if you were halfway through writing to a file.

Some processes will take multiple SIGTERM signals before they shutdown - have you tried that? The process may also document what signals you can send it to shut it down cleanly.

A process that is in a bad state may not have the chance to get to the signal handler, even if it has one registered. There are points where a signal cannot be received (You're in an interrupt, or already handling another signal, and some others that I can't pinpoint at the moment). IF your process is stuck (for whatever reason) in one of these points, the SIGTERM handler will never run, no matter how many times you send it. The only solution here is a SIGKILL, however I've even seen cases where that signal is ignored, in which case a system reboot is necessary.

Actual answer

To answer your question - in which cases is kill ignored and insist being killed with -9:

  • The process has registered a SIGTERM handler that specifically doesn't kill the process (note - the default SIGTERM will kill the process)
  • The process is stuck in a signal-blocked state where a SIGTERM handler cannot be run