22

I have a freshly installed Ubuntu-server which is ought to be the new backup-server for our VM-storage. The server has 4 nics, 2 of them 10Gbit (in fact an intel x540-T2 with the newest driver available) which are used to connect to the SAN. I have the nfs-share mounted locally and compared speed-differences while copying a directory with ~30 files, around 15 vm-images and corresponding log files. The Images are between 8 GB and 600 GB in size.

Using:

cp -rf /mnt/nfs-share /backup-storage/

bmon shows consequently around 600 MiB/s.

Using

rsync -av /mnt/nfs-share /backup-storage/

bmon shows some packets in the first seconds, halts for about 30 seconds and than builds up to about 60-75 MiB/s. CPU is around 60%.

What should/could I change to use rsync with the same performance as cp?

Pandya
  • 24,618
soulpath
  • 221

3 Answers3

22

I think these differences are fairly well established between cp and rsync. See this article as a reference, titled: A look at rsync performance.

excerpt:

The four commands tested were:

    rsync $SRC $DEST
    echo $SRC | cpio -p $DEST
    cp  $SRC $DEST
    cat $SRC > $DEST/$SRC

The results for rsync, cpio, cp, and cat were:

user    sys     elapsed hog MiB/s   test
5.24    77.92   101.86  81% 100.53  cpio
0.85    53.77   101.12  54% 101.27  cp
1.73    59.47   100.84  60% 101.55  cat
139.69  93.50   280.40  83% 36.52   rsync

I use rsync on a daily basis. There are things you can do to improve the situation.

For example you can try using the -W switch:

-W, --whole-file            copy files whole (w/o delta-xfer algorithm)

Also I would suggest making sure you have the 3.x versions of rsync. There were noticeable improvements when we moved up to the newer versions.

slm
  • 369,824
16

The way to make rsync have the same performance as cp is to spell it "cp".

The difference between the two commands is significant even though the net effect may be the same. In particular, rsync does a bunch of reading to see whether or not some file or part of a file should be copied.

Is there some reason that you want to use rsync? Because cp copies "blindly" you will see higher raw performance. If, for a set of triggering conditions, the "delta-transfer" mechanism of rsync is used, you'll see transfer rates drop and CPU use to rise pretty much in the manner you report.

msw
  • 10,593
  • I'm aware of the behaviour, but didn't expect such an effect. I thought that, given CPU-Power and IOPS rsync should be perform at least at 300 MiB/s, espacieally if the file to copy doesn't exist.

    I've not finished testing yet. The backup with rsync would be more convinient, but I can also write a script using cp, dd or whatever comes to mind. Now I want to test various possibilities on different filesystems to evaluate what suits best.

    – soulpath Sep 19 '13 at 11:11
  • 7
    You may call me an empiricist, but when your expectations and reality disagree, it is usually not reality which is mistaken. There are a dozen reasons why you could be incurring this penalty; even interleaving reads and writes on a SAN can have dramatic performance hits depending on fine details of the software. – msw Sep 19 '13 at 11:28
  • 3
    I wasn't in doubt about reality, just about rsync- but due to these differences I'll go with writing a script using cp and some checksums.

    Thanks for your advice!

    – soulpath Sep 19 '13 at 11:45
  • No, just don't use rsync on a networked file systems. Your computer need to download the entire file, so you lose all the advantage of rsync. – Giacomo Catenazzi Feb 27 '16 at 11:59
  • Sadly this answer is wrong in its detail. When copying between "local" filesystems (and yes, an NFS mount is a local filesystem in this context), rsync does not read the target file when copying unless you explicitly enable this counterproductive operation with --whole-file. In this situation it's just like a very slow cp. – Chris Davies Mar 07 '18 at 17:22
0

For this use case, rsync is an unnecessarily complex machine. If you are OK with synchronization based on comparing of file modification times and file sizes, then only filesystem metadata should be collected on both ends, compared, and the changed (or new) files should be copied by the (local) cp command.

You might be interested in this small and simple synchronizer that does this: Fitus/Zaloha.sh

It is used as follows:

$ Zaloha.sh --sourceDir="test_source" --backupDir="test_backup"

For maximum speed of the analysis phase, you might want to skip generation of the restore scripts: use the option --noRestore. Additionally, if you have the fast mawk installed, give option --mawk to use it.

Zaloha.sh collects filesystem metadata via find commands. One remaining question is about performance of find on your NFS share ...

AdminBee
  • 22,803
Petas
  • 1
  • Welcome to the site, and thank you for your contribution. As for your comments on rsync, please note however that unless the -c option is used, rsync also will include/skip files purely based on modification time and size, so this should not be the reason for the performance overhead. It does, however, verify the written files against the source based on checksums to detect corruption on transfer; this will however impose only a small performance penalty as the checksum calcilation is performed en passant while reading the files for data transfer. – AdminBee Mar 09 '20 at 15:34