105

I have been using rsync to copy files for some time. My understanding is that rsync is faster than cp when some of the files to transfer are already in the destination directory, transferring only the incremental difference (i.e. the "deltas").

If this is correct, would there be any advantage to using rsync to moving the contents of a folder A, to say, a folder B, with B being empty?

The folder A has close to 1TB of data (and millions of files in it). The transfer would be done over a local network (A and B being on different filesystems, both mounted on a supercomputer, e.g. A is NFS and B is lustre).

Aside from that, what flags should I use to ask rsync to move (not copy) files from A to B (i.e. to delete A when the transfer has successfully finished)?

Braiam
  • 35,991
  • 10
    I don't think rsync can replace mv. I would expect mv to be faster on most file system types when the source and destination are within the same file system, because rsync would have to make a copy no matter what, and mv could probably get away with changing a few directory entries. The closest thing I can find to an rsync mv is the --remove-source-files command, but that does not remove directories. – jw013 Jul 25 '12 at 17:19
  • 2
    Thanks @jw013! Just to clarify, the files are on different filesystems, and the transfer would be done on a network. Do you know if that would still make mv faster? – Amelio Vazquez-Reina Jul 25 '12 at 17:22
  • 1
    Well, mv can't operate across a network - it would have to rely a local mount (e.g. NFS). If the bottleneck is the network, rsync would probably be faster than mv because rsync can do compression. – jw013 Jul 25 '12 at 17:25
  • 2
    By the way cp has -u option to copy source file if it is newer than the destination file or when the destination file is missing – rush Jul 25 '12 at 17:38
  • What about this: Create a timestamp, then rsync all the files to remote. When done, delete all local files older than timestamp. – U. Windl Feb 08 '22 at 14:01

8 Answers8

113

You can pass --remove-source-files to rsync to move files instead of copying them.

But in your case, there's no point in using rsync, since the destination is empty. A plain mv will do the job as fast as possible.

In your case, what could make a difference to performance is the choice of network protocol, if you have a choice among NFS, Samba, sshfs, sftp, rsync over ssh, tar piped into ssh, etc. The relative speed of these methods depends on the file sizes, the network and disk bandwidth, and other factors, so there's no way to give general advice, you'll need to run your own benchmarks.

  • 13
    Just to reiterate what Caleb says, if you are worried about corruption due to e.g. a flaky network, rsync can make sense, as it verifies every file it writes by checksumming the blocks as it writes them. – Daniel S. Sterling Dec 01 '15 at 21:31
  • 9
    the --remove-source-files only deletes the files in the source. if you want to clear our the source, wouldn't you have to do an rm -rf (or find all directories and pass -delete) on the source after rsync runs successfully? – Trevor Boyd Smith Nov 29 '16 at 13:51
  • 4
    @DanielS.Sterling rsync doesn't checksum blocks after writing them (it uses checksums to find which parts of existing files were updated and need to be synchronized). You can do a second sync with --checksum to tell it to verify the results of the first synchronization. – Clément Apr 14 '19 at 00:28
  • I am migrating data from one data pool to another. For me this flag makes sense because I want to run the data on the other pool for a week or so (they are new disks) before removing them from the original pool. This flag lets me do that, ensuring everything is in sync just before deleting them from the old pool. – deed02392 Dec 16 '19 at 17:24
  • @Clement not according to the rsync man page discussing the -c flag: -c "Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is transferred" So Rsync may do a better job than mv to a different volume / host if you are concerned with data loss. – openCivilisation Feb 22 '20 at 12:48
  • @openCivilisation, yes, it verifies data in transfer; but it doesn't check that the writes were properly committed to storage (you need an extra pass with --checksum after the fact if you want that). In which cases would that work better than mv? – Clément Feb 24 '20 at 17:14
  • Also, rsync can not use multi-dirs as sender, like mv A B C D – Qinsi Jul 01 '20 at 07:26
  • Doesn't rsync flush file system buffers whereas mv can cause data loss? – darw Dec 05 '21 at 19:38
  • 1
    @darw You mean if there's a system crash during a copy+delete move (move between different filesystems), which results in the source file being deleted but the target not being written? That can happen both with mv and with rsync. – Gilles 'SO- stop being evil' Dec 05 '21 at 19:58
  • @Qinsi rsync can use multiple source directories just like mv. – Gilles 'SO- stop being evil' Dec 05 '21 at 19:59
  • @Clément --checksum is useful if there may be files in the destination that have the same timestamp and size but different content. It doesn't verify that the writes are committed to storage: if they're still in cache, it's just reading back from the cache. – Gilles 'SO- stop being evil' Dec 05 '21 at 20:02
  • It is also superior to mv when it comes to handling of duplicate files/ files that already exist in the destination path. – kerner1000 Feb 05 '22 at 11:28
51

Since --remove-source-files does not remove directories, I issue the following commands to move files over ssh:

rsync -avzh --remove-source-files --progress /source/ user@server:/target \
&&  find /source -type d -empty -delete

I personally like the --progress feature, as I do this transfer manually. Remove it if you're using a script. I expect that it slows down transfers marginally. The find command's delete option only deletes empty directories – do not use rm -rf, as it may delete non-empty directories in case a file was not transferred. The -delete option turns on the -depth option so that empty directory trees are deleted from the "bottom" up.

Kristian
  • 639
  • 6
  • 10
  • 6
    -delete is much nicer than -exec rmdir {} + etc – lkraav Jul 14 '16 at 06:11
  • 3
    I would skip the asterisk because and just have trailing slashes / with paths if doing this locally. If you use asterisk rsync will skip hidden files such as .htaccess or .htpasswd (if any) – Svetoslav Marinov Apr 15 '19 at 11:52
  • 2
    You should execute the find command only if rsync exists successfully. Otherwise, you'll remove empty directories in the source that you might actually want to preserve. Use the ampersand operator, so: rsync ... && find ... – deed02392 Dec 16 '19 at 17:28
  • You probably want to add -depth to the find statement. This causes a depth-first traversal, allowing the leaf directories to be deleted, and then, if the parent directories now become empty, they can be deleted too. Without this you need to run the find...delete several times. – dsz Aug 22 '20 at 06:42
  • 2
    NB: if your /source is actually a symlink, find will not delete any empty folders after rsync is finished. You have to cd into /source and then run find . -type d -empty -delete – GDP2 Sep 03 '20 at 17:02
  • 2
    @dsz when you use -delete the -depth action is automatically enabled – Chris Davies Feb 03 '22 at 23:52
  • @GDP2 You can also just add an extra slash after the symlink name (e.g. /source/), then find will traverse past the symlink. – SpinUp __ A Davis Oct 13 '23 at 04:50
  • Maybe I am a little bit picky, but can it actually not delete the source directory inside destination like mv does? For instance, moving empty src dir to dest dir would lead to still empty dest dir. – satk0 Feb 20 '24 at 12:19
  • EDIT U can achieve it by specifing source as source not as source/, rsync is picky – satk0 Feb 20 '24 at 12:35
23

In general as Gilles said there is no advantage to using rsync to move files when mv will get the same job done simpler and there is no potential speed gain between ordinary file systems.

There are however some times when there is an advantage. In particular, if you have any doubts about the stability of either the source, the destination, or the machine doing the work, using rsync gives you resume ability. This can be a notable advantage if you transfer is very large and, say, your power grid is unreliable. Using rsync will be a more stable way to avoid data corruption in the event of a failure and pick up where you left off.

Caleb
  • 70,105
  • 11
    I'd say this is a huge advantage. In fact, I'd say mv is only better if the target and source are in the same partition, so that mv only edits the file's metadata instead of doing a copy. – nomen May 30 '18 at 23:30
  • 3
    One time I need rsync rather than mv is when I want to preserve the folder structure (if you use --relative). – Sridhar Sarnobat Oct 07 '18 at 05:30
  • 2
    rsync can though preserve hard links within moved contents, if you ask it to do, while mv cannot. – Mauro Molinari Jan 17 '22 at 09:19
19

would there be any advantage to using rsync to moving the contents of a folder A, to say, a folder B, with B being empty?

I've found myself in a situation where rsync IS faster than mv simply because mv cannot handle the number of files in the directory. I have 1.8 million photos from a security camera that ran for 20 days and the mv command exits with a failure because it cannot allocate resources.

rsync however, seems to handle all the files without a problem.

Michael Mrozek
  • 93,103
  • 40
  • 240
  • 233
shadowv
  • 191
7

If you want to recursively merge directories... move one directory into another directory with potentially duplicate directory names, then please see my answer here on serverfault.com. mv does a poor job when directories exist with the same name, and rsync copies (read + write full data) every file instead of just moving them (read and write only metadata).

Peter
  • 1,247
  • Please add the sources here. I thin it's really good script so it might give your answer the attention it deserves. – kub1x Feb 10 '20 at 10:15
2

I wrote a Bash script that implements an rsync-based mv:

#!/usr/bin/bash

echo -e "Would you like to use a relative path to your source?[y/n?]\n" read ans if [[ $ans == y ]]; then echo -e "Relative source?\n" read source source="pwd/$source" elif [[ $ans == n ]]; then echo -e "Absolute path to your source?\n" read source source=${source/"~/"/"/home/jim/"}
else echo -e "Use small cap 'y' or 'n' only."
exit fi

echo -e "Would you like to use a relative path to your destination?[y/n?]\n" read ans2 if [[ $ans2 == y ]]; then echo -e "Relative destination?\n" read dest dest="pwd/$dest" dest=${dest/"~/"/"/home/jim/"}
elif [[ $ans2 == n ]]; then echo -e "Absolute path to your destination?\n" read dest
dest=${dest/"~/"/"/home/jim/"} else echo -e "Use small cap 'y' or 'n' only."
exit fi rsync -avh --remove-source-files --info=progress2 ${source} ${dest} && find ${source} -type d -empty -delete

Rsync's --info=progress2 option shows statistics based on the whole transfer, rather than individual files. The script's find command only deletes empty directories. As others did already mention, Rsync's resume ability makes it a more stable way of moving files.

To alias the script as mv add the following lines to your .bashrc:

alias mv='~/ComputerScience/SoftwareDevelopment/MySoftware/Bash/mvRsync.sh'
alias sudo="sudo "

Replace my specific path in the first of the lines above with the one of your choosing. Aliasing sudo is necessary if you plan on using the alias with sudo.

Disadvantages of this script:

  • No tab key completion of paths as with the mv command. This can be solved by rewriting this script so that it accepts CLI arguments instead of taking user's input with read.
  • Whoa... your username had me doing a doubletake... kudos to you, good robot usses! But seriously, this kind of contribution deserves recognition, sorry I only have one upvote to give. Thanks for this. – John Smith Jan 02 '24 at 22:25
0

--remove-source-files is perfectly fine if source and destination are on different volumes, but if they are on the same, you can use the following method to really move files.

Before:

find /test
/test
/test/empty_dir
/test/src
/test/src/50GB.bin

Execute rsync:

# SECONDS=0
# rsync --delete --recursive --backup --backup-dir=/test/dst /test/empty_dir/ /test/src
# echo "rsync needed $SECONDS seconds"
rsync needed 0 seconds

After:

find /test
/test
/test/dst
/test/dst/50GB.bin
/test/empty_dir
/test/src

This works because the --backup-dir feature moves and not copies files and as we compared /src against an empty dir it "deletes" everything to /dst. But note: It's still slower than mv as rsync executes a move on every file and not only on the root dir as mv would.

If you want to delete the (now empty) /src dir, use another rsync hack:

rsync --recursive --delete --include="/src**" --exclude="*" /test/empty_dir/ "/test"
mgutt
  • 467
  • This looks useful but I'm confused by what the role of /test/empty_dir/ as the rsync source dir is. – Sridhar Sarnobat Aug 31 '22 at 21:49
  • 1
    Because you are sycing the empty dir to the dir, which contains the files, rsync would delete them. But because the --backup-dir option is used, it instead moves them to this "backup dir". – mgutt Sep 01 '22 at 22:09
-1

I guess this smal script works:

copy_from=$(realpath "/tmp/a")
copy_to=$(realpath "/tmp/b")

while read from_file; do from_subdir=$(dirname $from_file | sed "s|$copy_from/||"); to_dir=$copy_to/$from_subdir; echo "Move: [$from_file] to [$to_dir]"; test ! -d $to_dir && mkdir -p $to_dir; mv $from_file $to_dir; done <<< $(find "$copy_from" -type f)

  • 2
    You fail to quote expansions, reading values that may contain backslashes wrongly, trimming whitespace from read values, using sed on data that is not text, using sed with unsanitized shell variables, needless testing for directory before mkdir -p, and reading the output from find with newline as the delimiter even though pathnames may contain newlines. There is also an issue if the source path is a substring of the destination path. You also do not consider the actual questions posed by the asker. – Kusalananda Feb 03 '22 at 19:15