32

I tested cp with the following commands:

$ ls
first.html   second.html  third.html

$ cat first.html
first

$ cat second.html
second

$ cat third.html
third

Then I copy first.html to second.html:

$ cp first.html second.html

$ cat second.html
first

The file second.html is silently overwritten without any errors. However, if I do it in a desktop GUI by dragging and dropping a file with the same name, it will be suffixed as first1.html automatically. This avoids accidentally overwriting an existing file.

Why doesn't cp follow this pattern instead of overwriting files silently?

isanae
  • 141
Wizard
  • 2,503
  • 10
    I imagine only the coreutils designers can truly answer the question, but it's just the way it works for now. Usually the apps are built assuming the user really means what they're doing and to minimize the extra prompting.

    If you want to change the behavior, alias 'cp' to 'cp -i' or 'cp -n'.

    – kevlinux Oct 24 '18 at 04:11
  • 8
    @kevlinux The coreutils devs are just implementing the POSIX standard. – Kusalananda Oct 24 '18 at 06:56
  • 3
    @Kusalananda Right, so the question is, why does the POSIX standard mandate this. – Lightness Races in Orbit Oct 24 '18 at 09:33
  • @LightnessRacesinOrbit You would have to talk to someone on the standard's committee about that as it's not revealed in the Rationale. – Kusalananda Oct 24 '18 at 09:45
  • 20
    Because back when it was designed people wanted to be as terse as possible with what they do (hence cp not copy) and knew what they did and when they made mistakes they did not try to blame tools. It was a totally different kind of people back then that did computers. Its like asking why a scalpel for a heart surgeon can cut into hands too. – PlasmaHH Oct 24 '18 at 10:30
  • This might have also been asked on https://retrocomputing.stackexchange.com/ – Mawg says reinstate Monica Oct 24 '18 at 11:55
  • imagine having cp foo bar && some-command --input-file bar. This would be using an older bar if the cp created instead a bar (copy) file. Of course I could add rm foo at the beginning, but that shouldn't be necessary, and by having cp overwrite the destination file it isn't. – Carlos Campderrós Oct 24 '18 at 15:32
  • 2
    Related: The UNIX-HATERS Handbook See "Accidents Will Happen" – Michael Hampton Oct 24 '18 at 17:10
  • 1
    Early UNIX philosophy tended to assume that only highly trained professionals would use computers at all, ever. Therefore efficiency of operation was prioritized over protecting users from themselves. If you say "copy A to B", then you definitely want to have the content of A available in B; you might have forgotten that B already has valuable content, but that's not worth complicating the more likely case for. – Kilian Foth Oct 24 '18 at 18:11
  • 5
    Unix was designed by and for computer-experts, with the assumption that the user knew what he was doing. The OS would do exactly what the user told it to if possible - without holding the user's hand and without asking for endless confirmations. If an operation overwrote something, it was assumed that was what the user wanted. Also remember that this was the early 1970s - pre-MS DOS, Windows and home computers - guiding and holding the user's hand every step of the way, was not yet common. Also, with teletype-machined as terminals, asking for confirmations always would be too cumbersome. – Baard Kopperud Oct 24 '18 at 19:01
  • 12
    Don't alias cp to cp -i or similar because you'll get used to having a safety net, making systems where it's not available (most of them) that much more risky. Better to teach yourself to routinely cp -i etc. if that's what you prefer. – Reid Oct 24 '18 at 19:55
  • 2
    @kevlinux What do you mean by "apps"? cp is not an "app", whatever those things are. It's a Unix tool: Unix never says PLEASE; it also never says SORRY. It does what you tell it to do. If you tell it to do dumb things, then the fault my dear Horatio lies not with your Unix *s but with your own faulty orders. :) We don't need Clippy to popup a window saying "Are you really, really sure?" every time we want to increment a machine register. – tchrist Oct 24 '18 at 21:44
  • 1
    Whatever the historical reason, IMO this behaviour is exactly right especially today. Every slightly important file should be under version control and have at least two backups. If you accidentally delete a file twice a year, costs you two git checkout -p commands. About a minute per year. If I had to confirm every file I wanted to be overwritten by cp, it would take me about five minutes every week. (Not for the confirmation itself, but for figuring out why it happens and how I should pre-remove those files in the future.) – leftaroundabout Oct 24 '18 at 21:48
  • 1
    @tchrist A software application "app" not an "app" like everything else is now. Call it a tool, utility, program, whatever, it's still a software application. – kevlinux Oct 25 '18 at 03:12
  • @kevlinux Do you mean it’s a program? – tchrist Oct 25 '18 at 03:42
  • Given that you know that the default behaviour of cp is to overwrite its target (if it's a file), what is your issue with this? Given that your GUI thingy works differently, why is your issue not with that tool instead? If this question, instead of asking "why?", asked "how?" (i.e. "how may I make cp do what my GUI tool does?"), it would be easier to answer in an opinion-neutral way. – Kusalananda Oct 25 '18 at 05:35
  • 1
    This idea that the authors of cp were were from a world of flawless men that never made any error when using commands, or whatever other justification, should address why in the case of mv , mv won't overwrite a directory that already exists. – barlop May 24 '19 at 14:34
  • Note: cp -al is not designed to silently overwrite, though. – Martin Braun Jun 20 '23 at 03:02

6 Answers6

56

The default overwrite behavior of cp is specified in POSIX.

  1. If source_file is of type regular file, the following steps shall be taken:

    3.a. The behavior is unspecified if dest_file exists and was written by a previous step. Otherwise, if dest_file exists, the following steps shall be taken:

    3.a.i. If the -i option is in effect, the cp utility shall write a prompt to the standard error and read a line from the standard input. If the response is not affirmative, cp shall do nothing more with source_file and go on to any remaining files.

    3.a.ii. A file descriptor for dest_file shall be obtained by performing actions equivalent to the open() function defined in the System Interfaces volume of POSIX.1-2017 called using dest_file as the path argument, and the bitwise-inclusive OR of O_WRONLY and O_TRUNC as the oflag argument.

    3.a.iii. If the attempt to obtain a file descriptor fails and the -f option is in effect, cp shall attempt to remove the file by performing actions equivalent to the unlink() function defined in the System Interfaces volume of POSIX.1-2017 called using dest_file as the path argument. If this attempt succeeds, cp shall continue with step 3b.

When the POSIX specification was written, there already was a large number of scripts in existence, with a built-in assumption for the default overwrite behavior. Many of those scripts were designed to run without direct user presence, e.g. as cron jobs or other background tasks. Changing the behavior would have broken them. Reviewing and modifying them all to add an option to force overwriting wherever needed was probably considered a huge task with minimal benefits.

Also, the Unix command line was always designed to allow an experienced user to work efficiently, even at the expense of a hard learning curve for a beginner. When the user enters a command, the computer is to expect that the user really means it, without any second-guessing; it is the user's responsibility to be careful with potentially destructive commands.

When the original Unix was developed, the systems then had so little memory and mass storage compared to modern computers that overwrite warnings and prompts were probably seen as wasteful and unnecessary luxuries.

When the POSIX standard was being written, the precedent was firmly established, and the writers of the standard were well aware of the virtues of not breaking backwards compatibility.

Besides, as others have described, any user can add/enable those features for themselves, by using shell aliases or even by building a replacement cp command and modifying their $PATH to find the replacement before the standard system command, and get the safety net that way if desired.

But if you do so, you'll find that you are creating a hazard for yourself. If the cp command behaves one way when used interactively and another way when called from a script, you may not remember that the difference exists. On another system, you might end up being careless because you're become used to the warnings and prompts on your own system.

If the behavior in scripts will still match the POSIX standard, you're likely to get used to the prompts in interactive use, then write a script that does some mass copying - and then find you're again inadvertently overwritten something.

If you enforce prompting in scripts too, what will the command do when run in a context that has no user around, e.g. background processes or cron jobs? Will the script hang, abort, or overwrite?

Hanging or aborting means that a task that was supposed to get done automatically will not be done. Not overwriting may sometimes also cause a problem by itself: for example, it might cause old data to be processed twice by another system instead of being replaced with up-to-date data.

A large part of the power of the command line comes from the fact that once you know how to do something on the command line, you'll implicitly also know how to make it happen automatically by scripting. But that is only true if the commands you use interactively also work exactly the same when invoked in a script context. Any significant differences in behavior between interactive use and scripted use will create a sort of cognitive dissonance which is annoying to a power user.

telcoM
  • 96,466
  • 59
    "Why does it work like this?" "Because the standard says so." "Why does the standard say so?" "Because it already worked liked this." – Baptiste Candellier Oct 24 '18 at 11:01
  • 17
    The last paragraph is the real reason. Confirmation dialogs and "Do you really want to do this?" prompts are for wimps :-) – TripeHound Oct 24 '18 at 14:36
  • @BaptisteCandellier - Agreed. Its like the ultimate reason is out there, but tantalizingly just out of this answer's reach. – T.E.D. Oct 24 '18 at 16:28
  • Tip: if you are still learning start every command with echo so that if you accidentally press enter all it does is printing the command line and thus you can review it. When you are sure you want to run the command echo printed simply do ^echo ^^ and the last cmdline will execute without the echo part. Alternatively: write to a file to your heart content and copy&paste when you are sure. – Bakuriu Oct 24 '18 at 17:16
  • 3
    That last paragraph is why rm -rf is so effective, even if you didn't actually mean to run it in your home directory... – Hannah Vernon Oct 24 '18 at 20:05
  • 3
    @T.E.D. Funny how nobody ever mentions how the unlink(2) syscall also ‘fails’ to ask “Mother, may I?” for confirmation whenever these sempiternal discussions again rear their dainty heads. :) – tchrist Oct 24 '18 at 22:03
  • I expanded my answer. – telcoM Oct 25 '18 at 11:06
  • Interesting point about hanging. I.e. if you run cp as a background job (&), then it can't read from the terminal. On modern UNIX, touch a b; cp -i a b & will say cp: overwrite 'b'?, but cp will be stopped (and you get a message about that as well). You can put cp back in the foreground (fg), but it does not re-print cp: overwrite 'b'?. Problem: in more complex scenarios, you can lose track of what it was asking. I'm not sure how useful prompting is for cp though. Maybe it's more useful to just have cp that treats overwrite as a mistake, and cp -f that does overwrite. – sourcejedi Oct 25 '18 at 12:25
24

cp comes from the beginning of Unix. It was there well before the Posix standard was written. Indeed: Posix just formalized the existing behavior of cp in this regard.

We're talking around Epoch (1970-01-01), when men were real men, women were real women and furry little creatures ... (I digress). In those days, adding extra code made a program bigger. That was an issue then, because the first computer that ran Unix was a PDP-7 (upgradable to 144KB RAM!). So things were small, efficient, without safety-features.

So, in those days, you had to know what you were doing, because the computer just did not have the power to prevent you from doing anything you regretted later.

(There is a nice cartoon by Zevar; search for "zevar cerveaux assiste par ordinateur" to find the evolution of the computer. Or try http://a54.idata.over-blog.com/2/07/74/62/dessins-et-bd/le-CAO-de-Zevar---reduc.jpg for as long as it exists)

For those really interested (I saw some speculation in the comments): The original cp on the first Unix was about two pages of assembler code (C came later). The relevant part was:

sys open; name1: 0; 0   " Open the input file
spa
  jmp error         " File open error
lac o17         " Why load 15 (017) into AC?
sys creat; name2: 0     " Create the output file
spa
  jmp error         " File create error

(So, a hard sys creat)

And, while we're at it: Version 2 of Unix used (code sniplet)

mode = buf[2] & 037;
if((fnew = creat(argv[2],mode)) < 0){
    stat(argv[2], buf);

which is also a hard creat without tests or safeguards. Note that the C-code for V2 Unix of cp is less than 55 lines!

Ljm Dullaart
  • 4,643
  • 5
    Almost correct, excepr it's "small furry" (creatures from Alpha Centauri) not "furry little"! – TripeHound Oct 24 '18 at 16:38
  • 2
    @T.E.D.: It's entirely possible early versions of cp just opened the destination with O_CREAT | O_TRUNC and performed a read/write loop; sure, with modern cp there are so many knobs that it basically has to try to stat the destination beforehand, and could easily check for existence first (and does with cp -i/cp -n), but if the expectations were established from original, bare bones cp tools, changing that behavior would break existing scripts needlessly. It's not like modern shells with alias can't just make cp -i the default for interactive use after all. – ShadowRanger Oct 24 '18 at 19:59
  • @ShadowRanger - Hmmm. You're quite right that I really have no idea if it was easy or hard to do. Comment deleted. – T.E.D. Oct 24 '18 at 20:04
  • Huh what. The only match for your search for is this answer. – Joshua Oct 24 '18 at 20:37
  • 1
    @ShadowRanger Yeah, but then that's just pushing the hard lesson down the road until it's on a production system... – chrylis -cautiouslyoptimistic- Oct 25 '18 at 00:38
  • @ShadowRanger @T.E.D. I found that cp pre-dates O_CREAT and O_TRUNC (and O_EXCL, which can be used to easily fail on existing files :-). See https://unix.stackexchange.com/questions/477401/why-was-cp-designed-to-silently-overwrite-existing-files?noredirect=1#comment873195_477600 The man pages for the system calls in UNIX V6 are available e.g. here: https://minnie.tuhs.org//cgi-bin/utree.pl?file=V6/usr/man/man2/ – sourcejedi Oct 25 '18 at 12:31
  • 4
    @sourcejedi: Fun! Doesn't change my basic theory (that it was easier to just unconditionally open with truncation, and creat happens to be equivalent to open+O_CREAT | O_TRUNC), but the lack of O_EXCL does explain why it wouldn't have been so easy to handle existing files; trying to do so would be inherently racy (you'd basically have to open/stat to check existence, then use creat, but on large shared systems, it's always possible by the time you got to creat, someone else made the file and now you've blown it away anyway). May as well just overwrite unconditionally. – ShadowRanger Oct 25 '18 at 12:58
19

Because these commands are also meant to be used in scripts, possibly running without any kind of human supervision, and also because there are plenty of cases where you indeed want to overwrite the target (the philosophy of the Linux shells is that the human knows what s/he is doing)

There are still a few safeguards:

  • GNU cp has a -n|--no-clobber option
  • if you copy several files to a single one cp will complain that the last one is not a directory.
xenoid
  • 8,888
  • This only applies to a vendor specific implementation and the question was not about that vendor specific implementation. – schily Oct 24 '18 at 10:39
11

Is it "do one thing at one time"?

This comment sounds like a question about a general design principle. Often, questions about these are very subjective, and we are not able to write a proper answer. Be warned that we may close questions in this case.

Sometimes we have an explanation for the original design choice, because the developer(s) have written about them. But I don't have such a nice answer for this question.

Why cp is designed this way?

The problem is Unix is over 40 years old.

If you were creating a new system now, you might make different design choices. But changing Unix would break existing scripts, as mentioned in other answers.

Why was cp designed to silently overwrite existing files?

The short answer is "I don't know" :-).

Understand that cp is only one problem. I think none of the original command programs protected against overwriting or deleting files. The shell has a similar problem when redirecting output:

$ cat first.html > second.html

This command also silently overwrites second.html.

I am interested to think how all these programs could be redesigned. It might require some extra complexity.

I think this is part of the explanation: early Unix emphasized simple implementations. For a more detailed explanation of this, see "worse is better", linked at the end of this answer.

You could change > second.html so it stops with an error, if second.html already exists. However as we mentioned, sometimes the user does want to replace an existing file. For example, she may be building up a complex command, trying several times until it does what she wants.

The user could run rm second.html first if she needs to. This might be a good compromise! It has some possible disadvantages of its own.

  1. The user must type the filename twice.
  2. People also get in to a lot of trouble using rm. So I would like to make rm safer as well. But how? If we make rm show each filename and ask the user to confirm, she now has to write three lines of commands instead of one. Also, if she has to do this too often, she will get into a habit and type "y" to confirm without thinking. So it could be very annoying, and it could still be dangerous.

On a modern system, I recommend installing the trash command, and using it instead of rm where possible. The introduction of Trash storage was a great idea e.g. for a single-user graphical PC.

I think it is also important to understand the limitations of the original Unix hardware - limited RAM and disk space, output displayed on slow printers as well as the system and development software.

Notice that original Unix did not have tab completion, to quickly fill in a filename for an rm command. (Also, the original Bourne shell does not have command history, e.g. like when you use the Up arrow key in bash).

With printer output, you would use line-based editor, ed. This is harder to learn than a visual text editor. You have to print some current lines, decide how you want to change them, and type an edit command.

Using > second.html is a bit like using a command in a line-editor. The effect it has depends on the current state. (If second.html already exists, its content will be discarded). If the user is not sure about the current state, she is expected to run ls or ls second.html first.

"Simple implementation" as a design principle

There is a popular interpretation of Unix design, which begins:

The design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.

...

Gabriel argued that "Worse is better" produced more successful software than the MIT approach: As long as the initial program is basically good, it will take much less time and effort to implement initially and it will be easier to adapt to new situations. Porting software to new machines, for example, becomes far easier this way. Thus its use will spread rapidly, long before a [better] program has a chance to be developed and deployed (first-mover advantage).

https://en.wikipedia.org/wiki/Worse_is_better

sourcejedi
  • 50,249
  • Why is overwriting the target with cp a "problem"? Having it interactively ask for permission, or fail may be as big a "problem" as that. – Kusalananda Oct 24 '18 at 11:17
  • wow, thank you. complement the guideline: 1) Write programs that do one thing and do it well. 2) Trust the programmer. – Wizard Oct 24 '18 at 11:17
  • 2
    @Kusalananda data loss is a problem. I personally am interested in reducing the risk that I lose data. There are various approaches to this. Saying that it is a problem does not mean the alternatives do not also have problems. – sourcejedi Oct 24 '18 at 11:27
  • @Kusalananda I am more interested in trying to keep my answer simple to read - which is something I struggle with! - than not offending people who love Unix :-). I particularly do not want to add too many qualifiers, or use too many different words at once, when the English in the question is not perfect :-). – sourcejedi Oct 24 '18 at 11:29
  • @sourcejedi No worries. The word "issue" may be less contentious than "problem". – Kusalananda Oct 24 '18 at 11:35
  • @riderdragon I edited my answer to put my opinion in bold text. "Trust the programmer" is a good description. It was probably an important reason for this design. I did not like to write it: I would also want to write that "trust the programmer" can be bad. – sourcejedi Oct 24 '18 at 16:23
  • 2
    @riderdragon Programs written in the C language can often fail in very surprising ways, because C trusts the programmer. But programmers are just not that reliable. We have to write very advanced tools, like valgrind, which are needed to try and find the mistakes that programmers make. I think it is important to have programming languages like Rust or Python or C# that try to enforce "memory safety" without trusting the programmer. (The C language was created by one of the authors of UNIX, in order to write UNIX in a portable language). – sourcejedi Oct 24 '18 at 16:23
  • 2
    Even better is cat first.html second.html > first.html will give result in first.html being overwritten with the contents of second.html only. The original contents are lost for all time. – doneal24 Oct 24 '18 at 17:38
  • What do you mean we can't get a proper answer? Just ask Ken, he knows everything. – tchrist Oct 24 '18 at 21:55
11

The design of "cp" goes back to the original design of Unix. There in fact was a coherent philosophy behind the Unix design, which has been slightly less that half-jokingly been referred to as Worse-is-Better*.

The basic idea is that keeping the code simple is actually a more important design consideration that having a perfect interface or "doing The Right Thing".

  • Simplicity -- the design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.

  • Correctness -- the design must be correct in all observable aspects. It is slightly better to be simple than correct.

  • Consistency -- the design must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency.

  • Completeness -- the design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must be sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.

(emphasis mine)

Remembering that this was 1970, the use case of "I want to copy this file only if it doesn't already exist" would have been a fairly rare use case for someone performing a copy. If that's what you wanted, you'd be quite capable of checking before the copy, and that can even be scripted.

As to why an OS with that design approach happened to be the one that won out over all other OS's being built at the time, the author of the essay had a theory for that as well.

A further benefit of the worse-is-better philosophy is that the programmer is conditioned to sacrifice some safety, convenience, and hassle to get good performance and modest resource use. Programs written using the New Jersey approach will work well both in small machines and large ones, and the code will be portable because it is written on top of a virus.

It is important to remember that the initial virus has to be basically good. If so, the viral spread is assured as long as it is portable. Once the virus has spread, there will be pressure to improve it, possibly by increasing its functionality closer to 90%, but users have already been conditioned to accept worse than the right thing. Therefore, the worse-is-better software first will gain acceptance, second will condition its users to expect less, and third will be improved to a point that is almost the right thing.

* - or what the author, but nobody else, called "The New Jersey approach".

sourcejedi
  • 50,249
T.E.D.
  • 291
  • 1
  • 6
  • 1
    This is the right answer. – tchrist Oct 24 '18 at 21:57
  • 1
    +1, but I think it would help to have a concrete example. When you install a new version of a program that you've edited and re-compiled (and maybe tested :-), you deliberately want to overwrite the old version of the program. (And you probably want similar behaviour from your compiler. So early UNIX only has creat() v.s. open(). open() could not create a file if it did not exist. It only takes 0/1/2 for read/write/both. It does not yet take O_CREAT, and there is no O_EXCL). – sourcejedi Oct 25 '18 at 11:01
  • @sourcejedi - Sorry, but as a software developer myself, I honestly can't think of another scenario than that one where I'd be doing a copy. :-) – T.E.D. Oct 25 '18 at 12:38
  • @T.E.D. sorry, I mean I'm suggesting this example, as one of the non-rare cases where you definitely want an overwrite, v.s. the comparison in the question where maybe you didn't. – sourcejedi Oct 25 '18 at 14:33
0

The main reason is that a GUI is by definition interactive, while a binary like /bin/cp is just a program which can be called from all kinds of places, for example from your GUI ;-). I'd bet that even today the vast majority of calls to /bin/cp will not be from a real terminal with a user typing a shell command but rather from a HTTP server or a mail system or a NAS. A built-in protection against user errors makes complete sense in an interactive environment; less so in a simple binary. For example, your GUI will most likely call /bin/cp in the background to perform the actual operations and would have to deal with the safety questions on standard out even though it just asked the user!

Note that it was from day one close to trivial to write a safe wrapper around /bin/cp if so desired. The *nix philosophy is to provide simple building blocks for users: of these, /bin/cp is one.