1

There are a few ways to interpret the operation of "replacing one file with another", but the one I want to focus on here is the one that can be achieved with the command

mv /that/there/someotherfile /this/here/somefile

In this example /that/there/someotherfile and /this/here/somefile are supposed to be regular files currently existing in the filesystem.

If all goes well, after running the command above, the file "formerly known as" /that/there/someotherfile will have disappeared, but its content will now be the new content of the file /this/here/somefile. The latter's former content will have been "overwritten."

Now consider the analogous operation of "replacing one directory with another." E.g. overwriting some directory /path/to/targetdir with the directory /some/other/path/to/sourcedir. I can do this with

rm -rf /path/to/targetdir && mv /some/other/path/to/sourcedir /path/to/targetdir

Can I do this with a single "more-or-less standard"1 command that works irrespective of the contents of the two directories in question?

I know that, if /path/to/targetdir happeans to be an empty directory, then

mv -T /some/other/path/to/sourcedir /path/to/targetdir

...will do the job.

Also, I know that if /path/to/targetdir does not contain any relative paths that is not also present under /some/other/path/to/sourcedir, and all relative paths present under both directories point to file system items of the same type (i.e. they are both directories, or both regular files, etc.), then the following gets close to the operation described above

rsync -a --remove-source-files /some/other/path/to/sourcedir/ /path/to/targetdir

Of course, it would not be difficult to implement2 a script or a function to encapsulate the rm -rf + mv sequence given above, but I would like to avoid implementing something that is already available through more-or-less standard Unix commands.


1 I realize that the answer to this question depends critically on what one considers the set of permissible commands, and, unfortunately, here I can offer nothing better than vigorous hand-waving... For example, I regard cp and mv as "more-or-less standard", but even in this case, some of the options these commands take may not be. In fact, if one makes this condition sufficiently precise (e.g. limiting the permissible commands to the "mandatory POSIX untilies"), there may be no general way to "replace one directory with another", in the sense described above, using a single command. If so, feel free to define the set of permissible commands in a way that you find would render your sufficiently useful and/or interesting. In other words, on the choice of the set of permissible commands, I am ready to defer to your good taste.

2 Famous last words.

kjo
  • 15,339
  • 25
  • 73
  • 114
  • 1
    I don't think it's implemented, so either you could code such a utility, or write a patch for mv, or create a bash function. Why hasn't it been implemented yet? It's too scary and easy to fail spectacularly in many unpredicable ways. – Artem S. Tashkinov Jan 21 '22 at 13:20
  • 1
    Technically, if moving within a filesystem, mv somefile otherfile doesn't just make the content of somefile appear in otherfile; it's actually the same file with a new name. The rename will keep the inode number and permissions etc. intact. cp somefile otherfile && rm -f somefile would be different in that regard. – ilkkachu Jan 21 '22 at 13:58
  • But really, the question I have is what's wrong with rm -rf target && mv source target? I.e. why does it matter if it's two commands or one? If you're concerned about the move being atomic, you'd need a system call that can replace the directory atomically, and you can't do that to a non-empty directory with plain rename() as you noted. Linux's renameat2() looks to have the RENAME_EXCHANGE flag which looks like it could work, though. – ilkkachu Jan 21 '22 at 14:02
  • 2
    The only issue I see with rm target && mv source target is that for a very brief interval, target does not exist. I think that is what OP is trying to avoid/mitigate. – DopeGhoti Jan 21 '22 at 14:24

3 Answers3

1

This exceedingly dangerous command could work, provided you forget about rsync's usual restartability

rsync -av --remove-source-files --delete-before /path/to/source/ /path/to/target

A number of caveats, though,

  • It doesn't move files - it copies them and deletes the original. For large files on the same filesystem this could a problem
  • It won't remove source directories. If that's a problem you're back to two commands, and at that point you might as well revert to your original rm && mv construct
  • It's not POSIX

On balance I think I'd prefer the rm && mv approach. This version requires bash (or possibly some other shell that has arrays). I believe its use of mv, rm and find are all POSIX compliant.

rmmv() {
    local args=("$@") target
if [ $# -eq 0 ]
then
    echo "${0##*/}: missing file operand" >&2
    exit 1
elif [ $# -eq 1 ]
then
    echo "${0##*/}: missing file operand after '${args[0]}'" >&2
    exit 1
fi

target="${args[@]: -1}"
unset "args[${#args[@]}-1]"

if [ -d "$target" ]
then
    # Directory target; remove its contents
    ( cd -P -- "$target" && find . -depth -path './*' -exec rm -rf {} + )

elif [ "${#args[@]}" -gt 1 ]
then
    # Multiple sources but not a directory
    echo "${0##*/}: target '$target' is not a directory" >&2
    exit 2
fi

# Do it
mv -- "${args[@]}" "$target"

}

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
1

TL,DR: use symbolic links.

POSIX directories

With only POSIX system calls, it's impossible to atomically replace a non-empty directory. The rename system call only allows the target to be a directory if it's empty. The link system call forbids the target from being a directory.

To replace /some/other/path/to/sourcedir by /path/to/targetdir, there are a few non-atomic approaches. Throughout this answer, I assume that the target directory already exists; if it doesn't, you may need to handle this case separately, which can be a bit tricky in the presence of concurrency. I also assume that the source and destination are on the same filesystem.

  • Move the existing target out of the way, then move the new version into place. This leaves a small window of time during which the target does not exist. This does not work reliably if two replacement processes are working concurrently.

    mv -f /path/to/targetdir /path/to/targetdir.old
    mv /some/other/path/to/sourcedir /path/to/targetdir
    rm -rf /path/to/targetdir.old
    
  • Remove the existing target, then move the new version into place. This leaves a potentially large window of time during which the target is only partially populated, and a small window of time during which the target does not exist. This does not work reliably if two replacement processes are working concurrently.

    rm -rf /path/to/targetdir
    mv /some/other/path/to/sourcedir /path/to/targetdir
    
  • Empty the existing target, then move the new version into place. This leaves a potentially large window of time during which the target is only partially populated. The target window keeps existing, and it is populated atomically. This does not work reliably if two replacement processes are working concurrently.

    (cd /path/to/targetdir && rm -rf ..?* .[!.]* *)
    mv /some/other/path/to/sourcedir /path/to/tmp/targetdir
    mv /path/to/tmp/targetdir /path/to/
    

    Note the intermediate step where the new version has the same base name as the target. This is necessary for mv: if the destination passed to mv is an existing directory, mv moves the source(s) into it. So we have to make the destination passed to mv be the parent of the existing directory we want to overwrite.

If you're willing to use kernel-specific system calls, there may be ways to do the replacement atomically.

Linux directories

The system call renameat2 of Linux ≥3.15 can atomically exchange two directory entries of arbitrary type. I don't know of any utility that provides an interface for it.

perl -we '
    require "syscall.ph";
    sub AT_FDCWD() {return -100}
    sub RENAME_EXCHANGE () {return 2}
    my ($source, $target) = @ARGV;
    $! = 0;
    syscall(SYS_renameat2(), AT_FDCWD, $source, AT_FDCWD, $target, 2) != -1 or die $!}
' /some/other/path/to/sourcedir /path/to/targetdir &&
rm -rf /some/other/path/to/sourcedir # This now contains the former target

This is a fully atomic replacement. If multiple replacement processes do this concurrently, the end result is that one of their new version will be put in place and the others will be erased.

Another way to effectively replace a directory atomically is a bind mount. This feature exists under many Unix variants, but the way to use it can differ. Here I'll use Linux's mount --bind, which requires root privileges. Another noteworthy method is the FUSE filesystem bindfs, which does not require unusual privileges.

mkdir /path/to/targetdir.old
mount --bind /path/to/targetdir /path/to/targetdir.old
mount --bind/some/other/path/to/sourcedir /path/to/targetdir
rm -rf /path/to/targetdir.old

Note that this doesn't move the file, it just makes them visible under the target path. With this approach, the source still exists.

POSIX symbolic links

The way to do an atomic replacement is to go through a symbolic link. A symbolic link can be replaced atomically. It's just a bit tricky because, counterintuitively, symlink won't do it. You have to use rename. Furthermore, if mv is given a target which is a symbolic link to a directory, it will move the source under that directory, rather than overwrite the symbolic link. You can make mv overwrite a target that is a symbolic link to a directory by passing the directory containing the symbolic link as a target, which requires the source to have the same base name.

# Precondition: /path/to/targetdir doesn't exist or is a symbolic link.
mkdir -p staging
ln -s /some/other/path/to/sourcedir staging/targetdir
old_target=$(cd /path/to/targetdir && pwd -P)
mv staging/targetdir /path/to/targetdir
# Postcondition: /path/to/targetdir is a symbolic link to the new version.
rm -rf "$old_target"

This does not work reliably if there are multiple concurrent instances, because the renaming is atomic, but figuring out what to delete is not.

0

In this case, instead of removing a command, I would suggest adding one:

mv target somewhere_else
mv source target

if both source, target and somewhere_else are in the same filesystem, those two commands should return very quickly. Then, not necessarily right away:

rm -rf somewhere_else

If disk space is an issue, you can divide the above task in subtrees.

Better solutions can be found depending on the type and number of files/folders that usually change between source and target: rsync, recursive diff/patch, version control,...