Syntactic differences in cp -r and how to overcome them

Question

Let's say we are in a blank directory. Then, the following commands:

mkdir dir1
cp -r dir1 dir2

Yield two (blank) directories, dir1 and dir2, where dir2 has been created as a copy of dir1. However, if we do this:

mkdir dir1
mkdir dir2
cp -r dir1 dir2

Then we instead find that dir1 has now been put inside dir2. This means that the exact same cp command behaves differently depending on whether the destination directory exists. If it does, then the cp command is doing the same as this:

mkdir dir1
mkdir dir2
cp -r dir1 dir2/.

This seems extremely counter-intuitive to me. I would have expected that cp -r dir1 dir2 (when dir2 already exists) would remove the existing dir2 (and any contents) and replace it with dir1, since this is the behavior when cp is used for two files. I understand that recursive copies are themselves a bit different because of how directories exist in Linux (and more broadly in Unix-like systems), but I'm looking for some more explanation on why this behavior was chosen. Bonus points if you can point me to a way to ensure cp behaves as I had expected (without having to, say, test for and remove the destination directory beforehand). I tried a few cp options without any luck. And I suppose I'll accept rsync solutions for the sake of others that happen upon this question who don't know that command.

In case this behavior is not universal, I'm on CentOS, using bash.

What is counter intuitive about that? If shoot an arrow at a tree something happens, if I shoot the same arrow in the same direction but someone is standing in front of the tree something else happens. Same program different data, different outcome. — Anthon, Dec 08 '14 at 18:50
"I would have expected that cp -r dir1 dir2 (when dir2 already exists) would remove the existing dir2 (and any contents).." What? Why? I can understand overwritting files, but removing any pre-existing files as well? O.o — muru, Dec 08 '14 at 18:52
@anthon But I haven't provided different inputs. In your example, you're shooting an arrow in a constant direction, not at a tree. @muru I wouldn't expect cp file1 file2 to append if file2 exists, I expect it to overwrite. My basis for anticipated behavior is on a literal interpretation of the syntax and on what is done with files, though other users may expect differently. — TTT, Dec 08 '14 at 19:29
Could the down-voter please provide some feedback on how I could improve the question? — TTT, Dec 08 '14 at 19:49
@TTT you can only notify one user in a comment. By that logic, since > file truncates a file, shouldn't >directory be equivalent to rm -r directory; mkdir directory? — muru, Dec 08 '14 at 20:04
@muru thanks. That's a tricky one, since > redirection is placing contents and not files/directories themselves, I wouldn't expect that behavior to be possible (it isn't) since you'd be writing data directly to a directory, rather than a file in a directory. My idea of cp -r overwriting isn't that I think that behavior is better, but just that it's consistent. I'd be equally content if cp file1 file2 and cp -r dir1 dir2 both appended. But, neither do. Instead, one (over)writes while the other's behavior depends on the situation. — TTT, Dec 08 '14 at 20:28
@TTT The problem is directories and files are treated differently in enough commands (e.g., ls by default, rm, touch when given a non-existent directory as argument, etc.) that that argument doesn't hold water. — muru, Dec 08 '14 at 20:33
Similar question I asked a bit later but got more visibility: https://unix.stackexchange.com/q/228597 — jakub.g, Jan 16 '23 at 19:54

muru · Accepted Answer · 2016-04-06T22:45:32.790

The behaviour you're looking for is a special case:

cp -R [-H|-L|-P] [-fip] source_file... target
[This] form is denoted by two or more operands where the -R option is specified. The cp utility shall copy each file in the file hierarchy rooted in each source_file to a destination path named as follows:

If target exists and names an existing directory, the name of the corresponding destination path for each file in the file hierarchy shall be the concatenation of target, a single <slash> character if target did not end in a <slash>, and the pathname of the file relative to the directory containing source_file.

If target does not exist and two operands are specified, the name of the corresponding destination path for source_file shall be target; the name of the corresponding destination path for all other files in the file hierarchy shall be the concatenation of target, a <slash> character, and the pathname of the file relative to source_file.

It shall be an error if target does not exist and more than two operands are specified ...

Therefore I'd say it's not possible to make cp do what you want.

Since your expected behaviour is "cp -r dir1 dir2 (when dir2 already exists) would remove the existing dir2 (and any contents) and replace it with dir1":

rm -rf dir2 && cp -r dir1 dir2

You don't even need to check if dir2 exists.

The rsync solution would be adding a trailing / to the source so that it doesn't copy dir1 itself into dir2 but copies the content of dir1 to dir2 (it will still keep existing files in dir2):

$ tree dir*
dir1
└── test.txt
dir2
└── test2.txt

0 directories, 2 file
$ rsync -a dir1/ dir2
$ tree dir*           
dir1
└── test.txt
dir2
└── test.txt
└── test2.txt

0 directories, 3 files
$ rm -r dir2          
$ rsync -a dir1/ dir2
$ tree dir*           
dir1
└── test.txt
dir2
└── test.txt

0 directories, 2 files

This answer, combined with your comments on inconsistent behaviors between directories and files for other commands answers my question in full. — TTT, Dec 08 '14 at 20:51

jota · Answer 2 · 2024-02-09T13:03:58.093

cp have several forms :

   cp [OPTION]... [-T] SOURCE DEST
   cp [OPTION]... SOURCE... DIRECTORY
   cp [OPTION]... -t DIRECTORY SOURCE...

when you are doing cp -r dir1 dir2 your are expecting an implicit evaluation of the second argument.

If the second argument is a name that does not exist in the file system, cp will interpret it as a DEST and will create it, copyind dir1 as dir2.
if it exist already, it is interpreted as a DIRECTORY and thus copy dir1 into it.

If you are not sure of you initial state, and want to avoid the multiple interpretation possible, you can simply explicit it with the -T option :

$ tree dir*
dir1
└── test.txt
dir2
└── test2.txt
0 directories, 2 file
$ cp -r dir1 -T dir2
$ tree dir*

dir1
└── test.txt
dir2
└── test.txt
0 directories, 3 files
$ rm -r dir2

$ cp -r dir1 -T dir2
$ tree dir*

dir1
└── test.txt
dir2
└── test.txt
0 directories, 2 files

Syntactic differences in cp -r and how to overcome them

2 Answers2