0

I am using WebDAV client to mount ownCloud folder (or any other cloud's folder) as a driver on my local computer

sudo apt-get install davfs2
mkdir oc
sudo mount.davfs https://b2drop.eudat.eu/remote.php/webdav/ oc

In this scenerio, assume that I have a single compressed file (A.tar.gz) on the mounted folder and that file is copied into my local directory as B.tar.gz.

Later that file is updated on the mounted directory and I want to copy only its updated section to my local directory using: rsync --no-whole-file oc/A.tar.gz /home/local/B.tar.gz, if possible without downloading the whole file.

Based on the the rsync algorithm only the pieces of A.tar.gz that are not found in B.tar.gz plus a small amount of data for checksums and block indexes should be sent over the link between source and the target.


In this scenario, when I use rsync over mounted folder, during the calculation of the checksum in order to transfer only the parts that have been modified:

=> Does rsync has to read the whole file, which will lead the complete data to be downloaded into the mounted directory instead of only its updated section?


Please note that there is a good explaination for rsync over sshfs on the following line: Differences between rsync on remote and rsync local on mounted sshfs?

alper
  • 469
  • can you ssh to the remote server and run rsync back to your machine? perhaps via ssh port-forwarding if your local machine is behind a firewall or NAT. that way, any checksum calcs are being done by the rsync on the remote machine on file(s) local to it. – cas Sep 30 '19 at 09:52
  • As I understand, when I do ssh to the remote server and run rsync to my machine, ex: rsync -a /home/local/B.tar.gz user@12.12.12.12:/home/remote/oc/A.tar.gz, only the modified part of the file and small amount of data for checksums and block indexes should be sent back to local. I cannot ssh to the remote server. On my case, local and remote machine can communicate through using a cloud storage, where local machine share its cloud folder with the remote machine and remote machine download it from there, or using P2P storage like IPFS. @cas – alper Sep 30 '19 at 11:56
  • 1
    What you have to know is that rsync optimises network traffic at the cost of local IO. When doing a proper rsync over a network connection between two hosts, each end read the file completely and just communicate block checksums to find what blocks need to be tranferred. This means reading the file twice (once at both ends). For a local transfer it's faster to skip reading the destination file and just copy the whole source file. Rsync doesn't know or care that the source may be on a network mounted filesystem. – wurtel Oct 01 '19 at 07:33
  • As I understand when I do rsync over ssh, since each end read the file and communicate block cheksums ; overall between two hosts only the modified blocks and block checksums will be transferred. Hence doing rsync over ssh is better approach than rsync on mounted, which will transfer the whole file. @wurtel – alper Oct 04 '19 at 06:54

1 Answers1

1

In this scenario, when I use rsync over mounted folder, during the calculation of the checksum in order to transfer only the parts that have been modified:

=> Does rsync has to read the whole file, which will lead the complete data to be downloaded into the mounted directory instead of only its updated section?

Yes, that is exactly right.

(When you look in the Linked sidebar on the right or the bottom of the page - "Differences between rsync on remote and rsync local on mounted sshfs?" - this is what the first part of the accepted answer there tries to explain. I found your explanation clearer).

sourcejedi
  • 50,249
  • As I understand transferring only the updated part of files between local and remote machine in most cases forces local machine to download the whole file no matter what, to detect the modified section to download. – alper Sep 30 '19 at 09:22
  • Maybe I should ask it in an another question following. What may be the best approach to over-come this? For example using Git might be a better solution to pull only deltas but GitHub is no cloud service and I cannot use it for large files. Maybe using diff to obtain the modified section and upload it into mounted directory, where local machine could download only the already uploaded modified part and later merge it with the original file. @sourcejedi – alper Sep 30 '19 at 09:23
  • @alper asking for the "best" does not work well. I think it is best to just ask if there is any way to do what you want. You need to be specific about what you want. – sourcejedi Sep 30 '19 at 09:30