15

Is there any difference if I run rsync with remote host (ssh://) as source/destination or use local path to share mounted directory via sshfs?

Could there be differences be with regards to security or copying speed without any switches usage, just puse rsync archive mode and remote host path (ssh). The same with sshfs mount just with source and destination (no cipher changes, just default).

Anthon
  • 79,293
galq
  • 168
  • 1
  • 1
  • 7

2 Answers2

22

SSHFS is convenient, but it doesn't mesh well with rsync or more generally with synchronization tools.

The biggest problem is that SSHFS largely kills rsync's performance optimizations. In particular, for medium to large files, when rsync sees that a file has been modified, it calculates checksums on parts of the file on each side in order to transfer only the parts that have been modified. This is an optimization only if the network bandwidth is significantly smaller than the disk bandwidth, which is usually the case. But with SSHFS, the “disk” bandwidth is in fact the network bandwidth, so rsync would have to read the whole file in order to determine what to copy. In fact, with a local copy (which it is, as far as rsync is concerned, even if one of the sides is on SSHFS), rsync just copies the whole file.

SSHFS is also detrimental to performance if there are many small files. Rsync needs to check at least the metadata of every file to determine whether it's been modified. With SSHFS, this requires a network round trip for each file. With rsync over SSH, the two sides can work in parallel and transfer information in bulk, which is a lot faster.

In terms of access restrictions, SSHFS requires SFTP access, whereas rsync requires the ability to run code (specifically, the rsync program) via a shell. If the user doesn't have a shell account, It's possible and common to provide an account with a special shell that only allows running a few programs including sftp-server and rsync. See Do you need a shell for SCP?

If you're only copying new files and there isn't a very large number of files, there is no meaningful performance difference.

SSHFS establishes an SSH connection when the filesystem is mounted and retains that connection until it's unmounted. Rsync makes a new connection each time you run it, but you can use the multiplexing feature and piggyback on a single main connection to avoid authenticating each time.

SSHFS is a FUSE filesystem and thus supports only traditional Unix metadata and ACL. Rsync can transfer extended attributes (you need to use rsync -aAX, note that a plain -a preserves only traditional Unix metadata).

  • For example I have a single compressed file on the mounted SSHFS directory and that file is copied to my local directory. Later that file is updated on the mounted directory and I want to copy only its updated section to my local directory using rsync --no-whole-file /mounted/file.tar.gz /home/local/file.tar.gz. When I use rsync for this operation, during calculation of the checksum in order to transfer only the parts that have been modified, does rsync should read the whole file, which will lead the complete data to be downloaded instead of only its updated section?@Gilles – alper Sep 29 '19 at 16:54
  • 2
    @alper When you use rsync over sshfs, rsync has to read the whole file. It can't know what needs to be updated otherwise. There's no way to optimize by transferring only checksums because there's no way to calculate checksums on the server. – Gilles 'SO- stop being evil' Sep 29 '19 at 19:45
  • Would it be same if the target location is a mounted folder rather than sshfs connection? Please see: https://unix.stackexchange.com/q/544404/198423 @Gilles – alper Sep 30 '19 at 07:44
5

To answer your main question: yes there are differences. With sshfs there is an existing connection to allow access to remote files over a secure channel and with rsync over ssh, that secure channel is set up to talk to a remote rsync instance.

To answer your secondary question: The rsync over ssh will be faster for most, if not all instances, because the rsync on the remote system provides more intelligence in finding files that don't need syncing, but primarily because it runs in parallel to your local rsync to gather that information.

The security of both ways is, assuming similar configuration of the ssh parameters (key-length, algorithms) the same. What the defaults are for your source and destination system, depends on the combinations of distributions on those systems.

Anthon
  • 79,293