436

I am always very hesitant to run kill -9, but I see other admins do it almost routinely.

I figure there is probably a sensible middle ground, so:

  1. When and why should kill -9 be used? When and why not?
  2. What should be tried before doing it?
  3. What kind of debugging a "hung" process could cause further problems?
Mikel
  • 57,299
  • 15
  • 134
  • 153

9 Answers9

403

Generally, you should use kill (short for kill -s TERM, or on most systems kill -15) before kill -9 (kill -s KILL) to give the target process a chance to clean up after itself. (Processes can't catch or ignore SIGKILL, but they can and often do catch SIGTERM.) If you don't give the process a chance to finish what it's doing and clean up, it may leave corrupted files (or other state) around that it won't be able to understand once restarted.

strace/truss, ltrace and gdb are generally good ideas for looking at why a stuck process is stuck. (truss -u on Solaris is particularly helpful; I find ltrace too often presents arguments to library calls in an unusable format.) Solaris also has useful /proc-based tools, some of which have been ported to Linux. (pstack is often helpful).

geekosaur
  • 32,047
  • 1
    Thanks. I agree. I guess I'm looking for a more compelling reason to not do it that is easy to convey to other sysadmins, and perhaps also some detailed recipes for debugging. – Mikel Mar 09 '11 at 11:34
  • 86
    the compelling reason is that if you get in the habit of sending SIGKILL, then when you get to a program which will, for example, corrupt an important database for you or your company, you'll really regret it. kill -9 has its use, as a last-resort terminator, emphasis on last-resort; admins that use it before the last-resort a) do not understand being an admin too well, and b) shouldn't be on a production system. – Arcege Mar 09 '11 at 12:39
  • 10
    @Mikel Another thing to through it, sometimes it's best to trick an app into cleaning itself up with a signal like SIGQUIT or SIGSEGV if it won't respond to SIGINT/SIGTERM. For example, a full screen 3-D app or even Xorg. Using SIGQUIT, it won't have a chance to clean-up anything, but tricking it into thinking a segment fault happen and it will feel it has no choice but to clean up and exit. – penguin359 Apr 03 '11 at 11:10
  • 18
    @Arcege Do you think that using a database that corrupts data if killed with -9 is a database worth using after all? iirc, mysql, bdb, pg, etc... all behave well when killed with -9. – dhruvbird Jan 28 '14 at 06:52
  • 14
    killall -9 java ftw – dmourati Jan 28 '14 at 07:10
  • 3
    @dhruvbird With file descriptors opened with O_SYNC, then they would be safe with a SIGKILL. Anything else risks a file being closed before buffers are written to the file. Is the data corrupted beyond repair, possibly not, but you can't say, with SIGKILL, that it won't be corrupted. Behavior not withstanding, the risk is still there. To answer your question, SIGKILL is for extreme circumstances and corruption is an expected risk, so yes, anything that corrupts on SIGKILL is acceptable. – Arcege Jan 28 '14 at 15:36
  • 2
    (Disclaimer: Some of this may be Linux specific.)

    My progression is usually SIGINT (i.e. ctrl-C if there's a console), SIGQUIT (i.e. ctrl-\), SIGTERM, and only then SIGKILL.

    Note that some processes won't die, even with SIGKILL. If it's in a syscall that will never exit, it'll never receive that signal. Check top for a D status. Also, check dmesg. You may find that you've bumped a kernel bug and then all bets are off.

    The only time I go straight for SIGKILL is when the process has tons of memory in swap, then SIGKILL will free it without waiting to swap it all back in.

    – Jayson Jan 28 '14 at 17:50
  • 4
    @Arcege I meant that all the databases (including sqlite) maintain separate journals and maintain the validity of these journals so that even if the db file is corrupted (say due to an incomplete read), the journal can be used to repair it. – dhruvbird Jan 28 '14 at 20:18
  • 1
    @dhruvbird But it would still be effort and possible downtime until the db file is repaired/restored (that's assuming that there are sufficient notifications to let you know of the corruption). Why deal with this risk by using something that should not be used? SIGKILL is not for terminating processes. – Arcege Jan 28 '14 at 20:36
  • 2
    @Arcege The db file repair isn't a separate process, but a part of the startup routing when the process is restarted. Usually, you'd have daemontools or something such tool constantly monitoring your daemons. Additionally, repairing the file takes not much time since it's a journal replay, and not something expensive like an fsck. In fact, there's a discussion about this on HN (dbs are supposed to handle -9 and recover fact after that). – dhruvbird Jan 29 '14 at 00:30
  • 33
    @dhruvbird: just because your DBs are supposed to come equipped with bullet-proof vests doesn't mean you should shoot them if you don't need to. While you may be right that it's not as risky as Arcege seems to say, I think his point still stands that it's risky and should be a last resort. – iconoclast Jan 30 '14 at 15:34
  • @dhruvbird so I guess when the process is killed files get closed? memory gets deallocated. file descriptor tables free, and any other system resources are back to the os? or not? and if the process causes zombies or orphans its only the pids that are stuck somewhere in a kernel structure, am I correct? – Hend Jan 16 '21 at 11:04
243

Randal Schwartz used to frequently post "Useless use of (x)" on lists. One such post was about kill -9. It includes reasons and a recipe to follow. Here is a reconstructed version (quoted below).

(Quote abomination)

No no no. Don't use kill -9.

It doesn't give the process a chance to cleanly:

1) shut down socket connections

2) clean up temp files

3) inform its children that it is going away

4) reset its terminal characteristics

and so on and so on and so on.

Generally, send 15, and wait a second or two, and if that doesn't work, send 2, and if that doesn't work, send 1. If that doesn't, REMOVE THE BINARY because the program is badly behaved!

Don't use kill -9. Don't bring out the combine harvester just to tidy up the flower pot.

Just another Useless Use of Usenet,

(.signature)

Calimo
  • 280
Shawn J. Goff
  • 46,081
  • 16
    Won't the operating system close any open file descriptors (including sockets) when the process terminates? – Rag Jan 28 '14 at 05:10
  • 8
    Yes it will. But suppose you are killing a server process with clients connected, then the clients won't notice that the server is gone before timeouts. – Stand with Gaza Jan 28 '14 at 08:48
  • 55
    Ah yes the old "if it is in any way imperfect you are stupid to use it" argument. – Timmmm Jan 28 '14 at 19:17
  • 3
    Or stupid to use if if the process in question is your company's production – Warren P Jan 29 '14 at 03:24
  • 2
    That link is no longer valid; looks like it was taken over by shady content squatting on the old URL reputation. – Nathan Kidd May 28 '15 at 01:11
  • 5
    If a process is killed then the socket will send RST to the peer, where as if the process calls close or shutdown on the socket, then the socket sends FIN. There is no timeout needed. A timeout situation will only occur if the power is dropped or the network cable removed. – ctrl-alt-delor May 31 '16 at 22:41
  • 3
    @BjörnLindqvist What you are saying is not true. When the last file descriptor referencing a TCP socket is closed, the kernel will send a packet telling the other end that the connection has been closed. That happens regardless of whether the last file descriptor was closed using the close system call, by killing the process, or in some other way. – kasperd Jul 04 '17 at 23:31
  • @Timmmm I'd intrepret this differently: it's just about motivating people to vote with their feet. – wardw Oct 04 '20 at 13:45
  • @Timmmm In this case, I'm using dd and it's hanging. I'm not prepared to boycott dd... – lmat - Reinstate Monica Nov 11 '20 at 20:35
  • @kasperd can you point me to a good URL what would be the consequences for the system if processes are killed using -9 not term? you mention sockets will get terminated regardless and peers will receive RST, and FIN? is that correct? even if the process is killed? – Hend Jan 16 '21 at 11:21
85

From a programmer's point of view, it should always be OK to do kill -9, just like it should always be OK to shutdown by pulling the power cable. It may be anti-social, and leave some recovery to do, but it ought to work, and is a power tool for the impatient.

I say this as someone who will try plain kill (15) first, because it does give a program a chance to do some cleanup -- perhaps just writing to a log "exiting on sig 15". But I won't accept any complaint about ill-behaviour on a kill -9.

The reason:

  • You can not prevent customers from doing silly things.
  • Random kill -9 testing is a good and fair test scenario.
  • If your system doesn't handle it, your system is broken.

However, not every software we use is ideal.

Further more, if you use kill -9, in any case, there is always a risk to lose data, regardless of the code robustness.

AdminBee
  • 22,803
dbrower
  • 1,017
  • 2
    How do you test for "random kill -9"? When you get kill -9, you are done and finished. – Karel Bílek Jan 28 '14 at 07:28
  • 19
    @Karel: You test whether your system can recover afterwards, and clean up any mangled transactions that were being processed at the time of SIGKILL. – Tadeusz A. Kadłubowski Jan 28 '14 at 08:09
  • 9
    It is not OK to do a kill -9 just like it is not OK to pull the plug off. While of course there are situations where you have no choice, this should be a last resort action. Of course, pulling the power cable or kill -9 shouldn't have adverse effects like preventing the application or the OS to restart properly if at all, but shit happens and using the recommended ways (kill [-15]) or regular shutdown will help avoiding the mess that might occur if you routinely interrupt programs and OSes that way. In any case, there is always a risk to lose data regardless of the code robustness. – jlliagre Jan 28 '14 at 12:51
  • 8
    I suspect what Michael meant by 'OK' is that your program should deal with this situation gracefully, and be able to do some form of cleanup on restart. For instance, cleaning up PID files and so forth, rather than just throwing its toys out of the pram and refusing to start. – gerryk Jan 28 '14 at 22:58
  • 3
    @gerryk They should indeed but the issue is some people will take that answer as a "license to kill -9" whatever the situation and the environment. It is an irresponsible attitude. – jlliagre Jan 29 '14 at 07:24
  • 1
    I you only use "crash-only" software (https://lwn.net/Articles/191059/) you can always send kill -9 safely. Unfortunately, most software ever written is not "crash-only". – Mikko Rantalainen Aug 28 '18 at 08:46
  • "It should always be OK to do kill -9" is analogous to "an application should not do any internal write caching ever", which is... a bit simplistic. Database engines perform quite advanced write caching for performance, so you should not kill -9 a database engine or any other application that contains persistent state and has built-in write caching. Applications which rely on OS's write cache may be safe to kill -9; only applications which explicitly sync their state to disk at each appropriate point can be assured to be safe... but that will cost in performance. – telcoM Jan 10 '22 at 10:09
45

I use kill -9 in much the same way that I throw kitchen implements in the dishwasher: if a kitchen implement is ruined by the dishwasher then I don't want it.

The same goes for most programs (even databases): if I can't kill them without things going haywire, I don't really want to use them. (And if you happen to use one of these non-databases that encourages you to pretend they have persisted data when they haven't: well, I guess it is time you start thinking about what you are doing).

Because in the real world stuff can go down at any time for any reason.

People should write software that is tolerant to crashes. In particular on servers. You should learn how to design software that assumes that things will break, crash etc.

The same goes for desktop software. When I want to shut down my browser it usually takes AGES to shut down. There is nothing my browser needs to do that should take more than at most a couple of seconds. When I ask it to shut down it should manage to do that immediately. When it doesn't, well, then we pull out kill -9 and make it.

borud
  • 559
  • 6
    I agree that a process should be written to be tolerant to such a failure, but I think it is still bad practice to do this. A database will recover but it might detect the rude abort and then trigger significant recovery checking when restarted. And what about the requests a process is serving? They will all be severed instantly, the clients might have bugs and fail too? – Daniel James Bryars May 24 '14 at 09:40
  • 4
    A database that can't be killed at any time isn't a properly reliable database. This is a pretty basic requirement if you require consistency.

    As for the clients: if they go haywire and corrupt data when the connection is severed, they are badly designed as well.

    The way to address loss of service is through redundancy and automatic failover/retry strategies. Usually for most of the system failing fast is preferable to trying to recover.

    – borud Sep 19 '14 at 16:36
  • 4
    @borud It may not be perfectly written software, but it's software people use all the time. What system administrators have the luxury of always being able to choose software that's perfectly written, down to always recovering gracefully from sudden disruption? Not many. Personally I use shutdown scripts, and start/stop processes via this. If they don't respond to the shutdown script (which does a proper signaling to the process), I kill -9. – Steve Sether Dec 29 '14 at 20:59
  • Usually it's the better kitchen implements that aren't dishwasher safe. – OrangeDog Jul 03 '17 at 13:29
  • @orangedog better by what standard? If you're just trying to cook basic stuff, ease of cleaning is often more relevant than other factors. – k_g Jul 04 '17 at 03:02
  • 2
    There is no difference between cooking basic stuff and more complex dishes with regard to the tools. The difference is the cook.

    (However, if you spend as much time cooking as I do, you do realize that robustness is a minimum requirement in kitchen tools and that most people who sell kitchen supplies to consumers wouldn't know a bad tool from a great tool.)

    – borud Jul 19 '17 at 19:25
  • You wrote: People should write software that is tolerant to crashes... You should learn how to design software that assumes that things will break. Symptom of a narrow point of view. The bigger picture is that perfect software has a higher cost than the economy (or you or your employer) will be willing to pay so it doesn't exist. You can tell people what they should do but perhaps if you would think differently if you bear the cost. If a improperly killed system is flawed until re-installed who bears the cost? If it is flawed in a subtle way you won't discover it until months later. – H2ONaCl Feb 07 '18 at 05:03
  • 2
    So you encourage people to be sloppy because it is hard to do things properly? More and more software is run in operational environments that are ephemeral. If you write software that gets fussy if it isn't shut down correctly, you are going to have a hard time convincing employers to hire you as a developer. – borud May 18 '18 at 22:03
11

Not mentioned in all the other answers is a case where kill -9 doesn't work at all, when a process is <defunct> and cannot be killed:

How can I kill a <defunct> process whose parent is init?

What is defunct for a process and why it doesn't get killed?

So before you attempt to kill -9 a <defunct> process run ps -ef to see what his parent is and attempt the -15(TERM) or -2(INT) and lastly -9(KILL) on his parent.

Note: what ps -ef does.

Later edit and caution: Proceed with caution when killing processes, their parent or their children, because they may leave files opened or corrupted, connections unfinished, may corrupt databases etc unless you know what kill -9 does for a process, use it only as a last resort, and if you need to run kill use the signals specified above before using -9 (KILL)

8

Killing processes willy-nilly is not a smooth move: data can be lost, poorly-designed apps can break themselves in subtle ways that cannot be fixed without a reinstall.. but it completely depends on knowing what is and what is not safe in a given situation. and what would be at risk. The user should have some idea what a process is, or should be, doing and what it's constraints are (disk IOPS, rss/swap) and be able to estimate how much time a long-running process should take (say a file copy, mp3 reencoding, email migration, backup, [your favorite timesink here].)

Furthermore, sending SIGKILL to a pid is no guarantee of killing it. If it's stuck in a syscall or already zombied (Z in ps), it may continue to be zombied. This is often the case of ^Z a long running process and forgetting to bg before trying to kill -9 it. A simple fg will reconnect stdin/stdout and probably unblock the process, usually then followed by the process terminating. If it's stuck elsewhere or in some other form of kernel deadlock, only a reboot may be able to remove the process. (Zombie processes are already dead after SIGKILL is processed by the kernel (no further userland code will run), there's usually a kernel reason (similar to being "blocked" waiting on a syscall to finish) for the process not terminating.)

Also, if you want to kill a process and all of its children, get into the habit of calling kill with the negated PID, not just the PID itself. There's no guarantee of SIGHUP, SIGPIPE or SIGINT or other signals cleaning up after it, and having a bunch of disowned processes to cleanup (remember mongrel?) is annoying.

Bonus evil: kill -9 -1 is slightly more damaging than kill -9 1 (Don't do either as root unless you want to see what happens on a throw-away, non-important VM)

dhchdhd
  • 348
7

Never never do a kill -9 1. Also avoid doing a kill on certain processes like mount`. When I have to kill a lot of processes (say for example an X session gets hung and I have to kill all the processes of a certain user), I reverse the order of the processes. For example:

ps -ef|remove all processes not matching a certain criteria| awk '{print $2}'|ruby -e '$A=stdin.readlines; A.reverse.each{|a| puts "kill -9 #{a}"}'|bash

Keep in mind that kill does not stop a process and release its resources. All it does is send a SIGKILL signal to the process; you could wind up with a process that's hung.

Michael Mrozek
  • 93,103
  • 40
  • 240
  • 233
HandyGandy
  • 2,209
5

Why you do not want to kill -9 a process normally

According to man 7 signal:

The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

This means that the application that receives either of these signals cannot "catch" them to do any shutdown behavior.

What you should do before running kill -9 on a process

You should make sure that before sending the signal to the process that you:

  1. Ensure that the process isn't busy (ie doing "work"); sending a kill -9 to the process will essentially result in the loss of this data.
  2. If the process is an non-responsive database ensure that it has flushed its caches first. Some databases support sending other signals to the process to force the flushing of its cache.
5

I've created a script that helps automate this issue.

It is based on my complete answer 2 in a question very similar at stackoverflow.

You can read all the explanations there. To summarize I would recommend just SIGTERM and SIGKILL, or even SIGTERM, SIGINT and SIGKILL. However I give more options in the complete answer.

Please, feel free to download (clone) it from the github repository to killgracefully 1

DrBeco
  • 774