Summary: If rsync gets data to a disk, it will do so losslessly. However, to be totally sure it actually got data to the disk, you'll need to apply the fsync.diff
patch, or call sync <files>
afterwards.
SSH provides data integrity—you're receiving the same data as you're sending. That accounts for the network.
Then, rsync uses the write
system call, asking the kernel to write the data to disk. Unless your hard disk is failing (a different question), this also preserves data integrity.
However, ensuring that data is actually on disk now is annoyingly not quite that simple. The write
manual page makes the following note:
A successful return from write() does not make any guarantee that
data has been committed to disk. In fact, on some buggy
implementations, it does not even guarantee that space has
successfully been reserved for the data. The only way to be sure is
to call fsync(2) after you are done writing all your data.
I downloaded the most recent (3.1.2pre1) rsync's source code, grepped for fsync
and got nothing. By default, rsync does not call fsync
(I also grepped for the metadata-less version fdatasync
: also nothing). This means whether those write
s have done anything yet is filesystem-dependent.
As a solution, you can either:
Run sync <files>
, which calls fsync
on the given files. When that returns, they're definitely on disk.
Download the rsync source patches directory (provided as a separate download). Apply the fsync.diff
patch by Sami Farin. It “lets you specify --fsync if you want fsync() to be called on every file we write”. (This will hopefully become default in the future.)
Usually though, modern filesystems do your writes Pretty Soon™, only briefly taking advantage of their freedom to cache when IO load is high. If you know your system, you might be OK with skipping this step. But do keep in mind when writing code for wider use that results may differ based on your filesystem, how it's tuned, and whether the on-drive firmware god is feeling benevolent.