4

I need to transfer files from several servers to a single remote host using Rsync over SSH. However, I need to be sure that before the file is deleted on the source using the --remove-from-source argument, that the file transferred is actually there.

From what I've been reading, there are no post transfer checksums and rsync trusts the kernel response but those articles were dated back from 2005-2009. I was wondering if this has changed in recent updates to rsync? If not, what method(s) are there to check this and then delete the source file once verified?

Edit: I don't get how this is a duplicate. My question does not pertain to local drives on the same system...

  • I doubt rsync has changed substantially since then but if the packet was mangled en route I doubt SSH would be able to get an intelligible result after trying to decrypt it, not to mention the CRC in the TCP packet. – Bratchley Aug 11 '15 at 16:08
  • 1
    rsync over ssh should be immune to network data corruption. If it's not, that's a security hole in SSH. – Peter Cordes Aug 11 '15 at 17:10
  • Aaaaa why does stackexchange only tell me this got closed after I spend an hour on a comprehensive answer!? – Anko Aug 11 '15 at 18:10
  • Very frustrating.. – linguru772x Aug 11 '15 at 18:18
  • I agree that sometimes we close too much and too fast. I nominated this question to reopen (although to be perfectly honest I don't believe it will happen). – jimmij Aug 11 '15 at 18:50
  • @linguru772x: It’s a duplicate because, even though your question focused on remote copy, and the other question is explicitly about local copy, the answers** to the other question do not focus on local copy. There seems to be a philosophy at Stack Exchange: “If two questions have the same answer, then they must be the same question.” (So it follows that the Ultimate Question of Life, the Universe, and Everything is the same as “What is 3²+4²+17?”, because they have the same answer. :-)  )  … (Cont’d) – G-Man Says 'Reinstate Monica' Aug 11 '15 at 20:25
  • (Cont’d) …  However, that other question is from three years ago, and your question explicitly acknowledges the old information and asks whether it has changed.  (It’s ironic that Stack Exchange is sensitive to the possibility that answers can become outdated, but then frowns on questions that ask whether old answers *have* become outdated.)  So I’m voting to reopen this.  … … … … … … … … … …  P.S. Did you mean to ask “Does rsync *offer* any type of checksum?” – G-Man Says 'Reinstate Monica' Aug 11 '15 at 20:26

1 Answers1

1

Summary: If rsync gets data to a disk, it will do so losslessly. However, to be totally sure it actually got data to the disk, you'll need to apply the fsync.diff patch, or call sync <files> afterwards.


SSH provides data integrity—you're receiving the same data as you're sending. That accounts for the network.

Then, rsync uses the write system call, asking the kernel to write the data to disk. Unless your hard disk is failing (a different question), this also preserves data integrity.

However, ensuring that data is actually on disk now is annoyingly not quite that simple. The write manual page makes the following note:

A successful return from write() does not make any guarantee that data has been committed to disk. In fact, on some buggy implementations, it does not even guarantee that space has successfully been reserved for the data. The only way to be sure is to call fsync(2) after you are done writing all your data.

I downloaded the most recent (3.1.2pre1) rsync's source code, grepped for fsync and got nothing. By default, rsync does not call fsync (I also grepped for the metadata-less version fdatasync: also nothing). This means whether those writes have done anything yet is filesystem-dependent.

As a solution, you can either:

  • Run sync <files>, which calls fsync on the given files. When that returns, they're definitely on disk.

  • Download the rsync source patches directory (provided as a separate download). Apply the fsync.diff patch by Sami Farin. It “lets you specify --fsync if you want fsync() to be called on every file we write”. (This will hopefully become default in the future.)

Usually though, modern filesystems do your writes Pretty Soon™, only briefly taking advantage of their freedom to cache when IO load is high. If you know your system, you might be OK with skipping this step. But do keep in mind when writing code for wider use that results may differ based on your filesystem, how it's tuned, and whether the on-drive firmware god is feeling benevolent.

Anko
  • 4,526
  • 1
    As of rsync 3.24, --fsync is a built-in flag: https://github.com/WayneD/rsync/blob/d1e42ffa1680b65bc878ab5a6cbfd12bf6345b9b/NEWS.md?plain=1#L137 – user9538 Jul 07 '22 at 05:57