320

I used rsync to copy a large number of files, but my OS (Ubuntu) restarted unexpectedly:

sudo rsync -azvv /home/path/folder1/ /home/path/folder2

After reboot, I ran rsync again, but from the output on the terminal, I found that rsync still copied those already copied before. But I heard that rsync is able to find differences between source and destination, and therefore to just copy the differences.

Source and target are both NTFS. The source is an external HDD and target is an internal HDD.

I wonder in my case if rsync can resume what was left last time?

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
Tim
  • 101,790
  • 1
    Yes, rsync won't copy again files that it's already copied. There are a few edge cases where its detection can fail. Did it copy all the already-copied files? What options did you use? What were the source and target filesystems? If you run rsync again after it's copied everything, does it copy again? – Gilles 'SO- stop being evil' Sep 16 '12 at 01:56
  • @Gilles: Thanks! (1) I think I saw rsync copied the same files again from its output on the terminal. (2) Options are same as in my other post, i.e. sudo rsync -azvv /home/path/folder1/ /home/path/folder2. (3) Source and target are both NTFS, buy source is an external HDD, and target is an internal HDD. (3) It is now running and hasn't finished yet. – Tim Sep 16 '12 at 02:30
  • 2
    There is also the --partial flag to resume partially transferred files (useful for large files) – Baldrick Sep 16 '12 at 16:15
  • @Gilles: What are some "edge cases where its detection can fail"? – Tim Sep 19 '12 at 05:20
  • 3
    @Tim Off the top of my head, there's at least clock skew, and differences in time resolution (a common issue with FAT filesystems which store times in 2-second increments, the --modify-window option helps with that). – Gilles 'SO- stop being evil' Sep 19 '12 at 09:25
  • 1
    if you did not have / or /. at the tail end of the file source path argument then it will be making an extra copy in a subdirectory that has the same name as the source directory – Skaperen Aug 19 '15 at 13:50
  • Use -u to speed up. – Martin T. Jul 05 '20 at 09:49
  • 1
    Have a look at aim for downloading/uploading with resume over http(s), ftp, and ssh. – Mihai Galos Apr 08 '22 at 20:45
  • @MihaiGalos how do you install the utility? – jarno Apr 09 '22 at 22:11
  • 1
    @jarno I've added the missing installation section: https://github.com/mihaigalos/aim#installation – Mihai Galos Apr 10 '22 at 12:06

10 Answers10

463

First of all, regarding the "resume" part of your question, --partial just tells the receiving end to keep partially transferred files if the sending end disappears as though they were completely transferred.

While transferring files, they are temporarily saved as hidden files in their target folders (e.g. .TheFileYouAreSending.lRWzDC), or a specifically chosen folder if you set the --partial-dir switch. When a transfer fails and --partial is not set, this hidden file will remain in the target folder under this cryptic name, but if --partial is set, the file will be renamed to the actual target file name (in this case, TheFileYouAreSending), even though the file isn't complete. The point is that you can later complete the transfer by running rsync again with either --append or --append-verify.

So, --partial doesn't itself resume a failed or cancelled transfer. To resume it, you'll have to use one of the aforementioned flags on the next run. So, if you need to make sure that the target won't ever contain files that appear to be fine but are actually incomplete, you shouldn't use --partial. Conversely, if you want to make sure you never leave behind stray failed files that are hidden in the target directory, and you know you'll be able to complete the transfer later, --partial is there to help you.

With regards to the --append switch mentioned above, this is the actual "resume" switch, and you can use it whether or not you're also using --partial. Actually, when you're using --append, no temporary files are ever created. Files are written directly to their targets. In this respect, --append gives the same result as --partial on a failed transfer, but without creating those hidden temporary files.

So, to sum up, if you're moving large files and you want the option to resume a cancelled or failed rsync operation from the exact point that rsync stopped, you need to use the --append or --append-verify switch on the next attempt.

As @Alex points out below, since version 3.0.0 rsync now has a new option, --append-verify, which behaves like --append did before that switch existed. You probably always want the behaviour of --append-verify, so check your version with rsync --version. If you're on a Mac and not using rsync from homebrew, you'll (at least up to and including El Capitan) have an older version and need to use --append rather than --append-verify. Why they didn't keep the behaviour on --append and instead named the newcomer --append-no-verify is a bit puzzling. Either way, --append on rsync before version 3 is the same as --append-verify on the newer versions.

--append-verify isn't dangerous: It will always read and compare the data on both ends and not just assume they're equal. It does this using checksums, so it's easy on the network, but it does require reading the shared amount of data on both ends of the wire before it can actually resume the transfer by appending to the target.

Second of all, you said that you "heard that rsync is able to find differences between source and destination, and therefore to just copy the differences."

That's correct, and it's called delta transfer, but it's a different thing. To enable this, you add the -c, or --checksum switch. Once this switch is used, rsync will examine files that exist on both ends of the wire. It does this in chunks, compares the checksums on both ends, and if they differ, it transfers just the differing parts of the file. But, as @Jonathan points out below, the comparison is only done when files are of the same size on both ends — different sizes will cause rsync to upload the entire file, overwriting the target with the same name.

This requires a bit of computation on both ends initially, but can be extremely efficient at reducing network load if for example you're frequently backing up very large files fixed-size files that often contain minor changes. Examples that come to mind are virtual hard drive image files used in virtual machines or iSCSI targets.

It is notable that if you use --checksum to transfer a batch of files that are completely new to the target system, rsync will still calculate their checksums on the source system before transferring them. Why I do not know :)

So, in short:

If you're often using rsync to just "move stuff from A to B" and want the option to cancel that operation and later resume it, don't use --checksum, but do use --append-verify.

If you're using rsync to back up stuff often, using --append-verify probably won't do much for you, unless you're in the habit of sending large files that continuously grow in size but are rarely modified once written. As a bonus tip, if you're backing up to storage that supports snapshotting such as btrfs or zfs, adding the --inplace switch will help you reduce snapshot sizes since changed files aren't recreated but rather the changed blocks are written directly over the old ones. This switch is also useful if you want to avoid rsync creating copies of files on the target when only minor changes have occurred.

When using --append-verify, rsync will behave just like it always does on all files that are the same size. If they differ in modification or other timestamps, it will overwrite the target with the source without scrutinizing those files further. --checksum will compare the contents (checksums) of every file pair of identical name and size.

UPDATED 2015-09-01 Changed to reflect points made by @Alex (thanks!)

UPDATED 2017-07-14 Changed to reflect points made by @Jonathan (thanks!)

  • 1
    According to the documentation --append does not check the data, but --append-verify does. Also, as @gaoithe points out in a comment below, the documentation claims --partial does resume from previous files. – Alex Aug 28 '15 at 03:49
  • 1
    Thank you @Alex for the updates. Indeed, since 3.0.0, --append no longer compares the source to the target file before appending. Quite important, really! --partial does not itself resume a failed file transfer, but rather leaves it there for a subsequent --append(-verify) to append to it. My answer was clearly misrepresenting this fact; I'll update it to include these points! Thanks a lot :) – DanielSmedegaardBuus Sep 01 '15 at 13:29
  • 13
    This says --partial is enough. – Cees Timmerman Sep 15 '15 at 17:21
  • @CeesTimmerman RTFM ;) Or test it out yourself :) – DanielSmedegaardBuus Sep 16 '15 at 09:25
  • 2
  • @CeesTimmerman, I have to say I find that statement misleading, since it doesn't use the partial file in subsequent transfers. In fact, I am still trying to find out how to achieve this. – Alex Sep 24 '15 at 05:03
  • I have been playing with this more, and have found a situation where --append and --append-verify are not safe. "If a file needs to be transferred and its size on the receiver is the same or longer than the size on the sender, the file is skipped." This is the case even with -c or verify! – Alex Sep 24 '15 at 05:05
  • @CeesTimmerman — You quote it and still misread it. It just says a subsequent transfer, it doesn't say a subsequent transfer with just --partial, which is why I suggested you test it for yourself. – DanielSmedegaardBuus Oct 19 '15 at 07:29
  • @Alex — Interesting. The "same size" is expected, of course, but the or longer part is pretty weird. Are you sure this applies to --append-verify as well? And also it it applies when the file stats differ, as they ought to if such a scenario were ever to occur? The guaranteed-to-work fallback would be to always use --checksum. – DanielSmedegaardBuus Oct 19 '15 at 07:32
  • @DanielSmedegaardBuus It doesn't say with additional options either, which it should, IMO. You're right that it needs --append to not retransfer the file, but that skips larger destination files even when the source files are newer, even using --append-verify, and --checksum is terribly slow when there's lots of files/data. :( – Cees Timmerman Oct 19 '15 at 10:15
  • So does that mean, when I'm writing an automated script (that needs to be resumable), I should always have --partial --append-verify together? Or does --append-verify imply --partial? Also what about --partial-dir? – CMCDragonkai May 04 '16 at 10:14
  • @CMCDragonkai --partial --append-verify would do the trick. --partial-dir cannot be used alongside an append switch, although it would be nice. Just keep in mind that files on the receiving end may appear to be "normal" even though they're not fully uploaded (while uploading, or after a failure until it is re-attempted). – DanielSmedegaardBuus May 10 '16 at 12:11
  • 3
    @CMCDragonkai Actually, check out Alexander's answer below about --partial-dir — looks like it's the perfect bullet for this. I may have missed something entirely ;) – DanielSmedegaardBuus May 10 '16 at 19:31
  • 5
    @DanielSmedegaardBuus I tested it out myself on a slow connection, and this is what I see with only --partial: rsync copies the file into the temporary name, connection is interrupted, the remote rsync eventually moves that file to the regular name and quits, then upon re-running with --partial and without --append, the new temporary file is initialized with a copy of the partially-transferred remote file, then the copy continues from where the connection died. (Ubuntu 14.04 / rsync 3.1) – Izkata Aug 23 '16 at 15:18
  • 2
    It seems newer versions of rsync: version 3.1.1 protocol version 31 default to deleting the temporary (e.g. .TheFileYouAreSending.lRWzDC) partially transferred files. IIRC the older versions behaved as per this answer, by default a failed transfer leaves a .TheFileYouAreSending.lRWzDC type file. Nowadays it seems rsync cleans up after itself. Which makes sense, if those temporary files can't be used for a resume operation, then they are just clutter really. See man rsync | less '+/By default, rsync will delete' – the_velour_fog Sep 15 '16 at 07:37
  • 5
    What's your level of confidence in the described behavior of --checksum? According to the man it has more to do with deciding which files to flag for transfer than with delta-transfer (which, presumably, is rsync's default behavior). – Jonathan Y. Jun 14 '17 at 05:48
  • I share @JonathanY.'s concerns; if you read up on the -W flag in the man-page, it hints that for non-local paths the use of delta-transfers is the default behavior. – Hugo Ideler Jun 22 '17 at 06:54
  • 2
    I wish the "So, in short" part were at the top of the answer so I didn't have to read through it every time I look up this post (I keep forgetting the switch to use). Would you add a "TL;DR" at the top, maybe? – slhck Sep 06 '17 at 11:52
  • 2
    Per the man: -W, --whole-file With this option rsync’s delta-transfer algorithm is not used (...) This is the default when both the source and destination are specified as local paths. (...) --stats This tells rsync to print a verbose set of statistics on the file transfer, allowing you to tell how effective rsync’s delta-transfer algorithm is for your data.

    This tends to imply that delta-transfer is the default.

    Indeed, -checksum is about determining if a file has changed (by checksumming at both ends, thus reading all files from disk when they have the same size&date).

    – jrouquie Mar 22 '18 at 20:24
  • 2
    Add TL;DR with short answer. – mrgloom Aug 23 '19 at 11:50
  • man rsync says that --inplace is automatically used if a resend is required on a verification failure. – Tom Hale Sep 12 '19 at 13:59
  • 2
    ohhh hello old friend, new year same revisit to this question and answer. it ages like wine – ipatch Jan 02 '21 at 21:04
  • This answer is partially wrong. delta-transfer is rsync's default, so you don't need to use --append or --append-verify. --partial is enough. The difference is that without --append-verify the already existing file is copied to a temp file, while with --append-verify it writes directly to the destination file. In both cases the already existing content is checksum checked. – mgutt Mar 07 '22 at 12:07
  • @mgutt --append tells rsync to append onto smaller files in the target folder with the same name as a source. Without it, it will overwrite the smaller file on the target with the larger file from the source. --append-verify first verifies that the smaller file's data is identical to the initial portion of the source file - if not, it also overwrites. --partial tells rsync to keep a partially transferred file when exiting, otherwise it deletes it. So, with the combination of these arguments, you can effectively cancel rsync in the middle of transferring a large file, and resume it later. – DanielSmedegaardBuus Mar 10 '22 at 12:01
  • 1
    "Without it, it will overwrite the smaller file" This is wrong. As I said delta-transfer is the default. overwriting is only done if you use --whole-file (which disables delta-transfer). – mgutt Mar 11 '22 at 14:27
  • Sadly almost all of this is completely wrong for the use case (now) shown in the question, which is a local to local filesystem copy – Chris Davies Dec 27 '22 at 10:27
103

TL;DR:

Just specify a partial directory as the rsync man pages recommends:

--partial-dir=.rsync-partial

Longer explanation:

There is actually a built-in feature for doing this using the --partial-dir option, which has several advantages over the --partial and --append-verify/--append alternative.

Excerpt from the rsync man pages:

--partial-dir=DIR
      A  better way to keep partial files than the --partial option is
      to specify a DIR that will be used  to  hold  the  partial  data
      (instead  of  writing  it  out to the destination file).  On the
      next transfer, rsync will use a file found in this dir  as  data
      to  speed  up  the resumption of the transfer and then delete it
      after it has served its purpose.
  Note that if --whole-file is specified (or  implied),  any  par-
  tial-dir  file  that  is  found for a file that is being updated
  will simply be removed (since rsync  is  sending  files  without
  using rsync's delta-transfer algorithm).

  Rsync will create the DIR if it is missing (just the last dir --
  not the whole path).  This makes it easy to use a relative  path
  (such  as  "--partial-dir=.rsync-partial")  to have rsync create
  the partial-directory in the destination file's  directory  when
  needed,  and  then  remove  it  again  when  the partial file is
  deleted.

  If the partial-dir value is not an absolute path, rsync will add
  an  exclude rule at the end of all your existing excludes.  This
  will prevent the sending of any partial-dir files that may exist
  on the sending side, and will also prevent the untimely deletion
  of partial-dir items on the receiving  side.   An  example:  the
  above  --partial-dir  option would add the equivalent of "-f '-p
  .rsync-partial/'" at the end of any other filter rules.

By default, rsync uses a random temporary file name which gets deleted when a transfer fails. As mentioned, using --partial you can make rsync keep the incomplete file as if it were successfully transferred, so that it is possible to later append to it using the --append-verify/--append options. However there are several reasons this is sub-optimal.

  1. Your backup files may not be complete, and without checking the remote file which must still be unaltered, there's no way to know.

  2. If you are attempting to use --backup and --backup-dir, you've just added a new version of this file that never even existed before to your version history.

However if we use --partial-dir, rsync will preserve the temporary partial file, and resume downloading using that partial file next time you run it, and we do not suffer from the above issues.

chaos
  • 48,171
  • 3
    Few nits regarding --partial-dir and I/O amount and/or disk write operations (CF, SSD, etc): 1. When a path which is not in the same partition (another disk, RAM drive, etc.) as the files being synchronized, a file copy between the specified directory and the target will occur when done; 2. When large files are involved, it's recommended to use a relative path (located in the same partition - not a symbolic link, etc.); 3. When using temporary storage (such as a RAM drive), one should also be aware that files to be synchronized will be limited by temporary storage free space. – Helder Magalhães Apr 21 '20 at 08:44
54

You may want to add the -P option to your command.

From the man page:

--partial By default, rsync will delete any partially transferred file if the transfer
         is interrupted. In some circumstances it is more desirable to keep partially
         transferred files. Using the --partial option tells rsync to keep the partial
         file which should make a subsequent transfer of the rest of the file much faster.

-P The -P option is equivalent to --partial --progress. Its pur- pose is to make it much easier to specify these two options for a long transfer that may be interrupted.

So instead of:

sudo rsync -azvv /home/path/folder1/ /home/path/folder2

Do:

sudo rsync -azvvP /home/path/folder1/ /home/path/folder2

Of course, if you don't want the progress updates, you can just use --partial, i.e.:

sudo rsync --partial -azvv /home/path/folder1/ /home/path/folder2
gaoithe
  • 289
N2O
  • 649
  • @Flimm not quite correct. If there is an interruption (network or receiving side) then when using --partial the partial file is kept AND it is used when rsync is resumed. From the manpage: "Using the --partial option tells rsync to keep the partial file which should make a subsequent transfer of the rest of the file much faster." – gaoithe Aug 19 '15 at 11:29
  • 2
    @Flimm and @gaoithe, my answer wasn't quite accurate, and definitely not up-to-date. I've updated it to reflect version 3 + of rsync. It's important to stress, though, that --partial does not itself resume a failed transfer. See my answer for details :) – DanielSmedegaardBuus Sep 01 '15 at 14:11
  • 3
    @DanielSmedegaardBuus I tried it and the -P is enough in my case. Versions: client has 3.1.0 and server has 3.1.1. I interrupted the transfer of a single large file with ctrl-c. I guess I am missing something. – guettli Nov 18 '15 at 12:28
  • Why vv? i.e. v used 2 times? – mrgloom Aug 23 '19 at 11:51
  • Where rsync save part of file with -azvvP ? – mrgloom Aug 23 '19 at 11:56
  • Just to confirm, does -azvvP also do checksum verifications like append-verify mentioned in the other answers? – a06e Nov 14 '19 at 10:47
8

Arriving late to this, but I had the same question and I found a different answer.

The --partial flag ("keep partially transferred files" in rsync -h) is useful for large files, as is --append ("append data onto shorter files"), but the question is about a large number of files.

To avoid files that have already been copied use -u (or --update: "skip files that are newer on the receiver").

  • 3
    Note to self: do not use --ignore-existing in combination with --append. Rsync will leave your interrupted file as is thinking you're happy with it even if it's incomplete. This took me a bit of headscratching to realize. That's the downside of having a bunch of preconfigured options that you use routinely. – Sridhar Sarnobat Dec 24 '19 at 06:13
5

Several important rules:

  1. rsync use delta-xfer algorithm to determine whether to resend the blocks differ, except with -W, --whole-file option.
  2. rsync will write data into a temporary directory and move to destination when complete, except with --inplace option.
  3. when delta-xfer is enabled, if you want to skip the computing of checksum of blocks of partial sent data, you can add --append option, but the sameness of partial sent data should be ensured by yourself.
  4. --append implies --inplace, which itself implies --partial

In my case, I want to send incremental files without too much CPU & disk load, the command is

rsync -avPL --inplace --append --bwlimit 30m -e 'ssh -o StrictHostKeyChecking=no' <src> <dst>
weaming
  • 151
  • 1
  • 4
  • 2
    Your #1 only when transferring between two systems. If you transfer between two parts of the same filesystem the whole delta thing is disabled – Chris Davies Jun 25 '21 at 07:09
1

In my case rsync was failing and quitting the executable. In which case I've used this simple bash while/do script.

#!/bin/bash

source="/tmp/source" # Change it ! destination="/tmp/destination" # Change it !

while true do if rsync -avz --partial $source $destination; then # SC2181 echo "rsync completed normally" exit else echo "Rsync failure. Backing off and retrying in 180s..." sleep 180 fi done

Before running the script, you have to set source and destination to values of your choice.

1

TLDR

This will resume partial transfers, and add compression for faster transfers

rsync --partial --progress --archive --compress --compress-choice=zstd --compress-level=9 --checksum-choice=xxh3 user@host:~/my_file.txt .

Flag explanation

--partial: Resume partial file transfer

--progress: Show transfer progress and ETA

--archive: Preserve file attributes

--compress: Enable compression (zlib by default)

--compress-choice=zstd: Enable zstd compression (faster and better compression than zlib)

--compress-level=9: Increase the compression level from the default of 3 (tradeoff vs maximum of 19)

--checksum-choice=xxh3: Use xxh3 hashing algorithm (very fast)

Bob
  • 149
0

I think you are forcibly calling the rsync and hence all data is getting downloaded when you recall it again. use --progress option to copy only those files which are not copied and --delete option to delete any files if already copied and now it does not exist in source folder...

rsync -avz --progress --delete -e  /home/path/folder1/ /home/path/folder2

If you are using ssh to login to other system and copy the files,

rsync -avz --progress --delete -e "ssh -o UserKnownHostsFile=/dev/null -o \
StrictHostKeyChecking=no" /home/path/folder1/ /home/path/folder2

let me know if there is any mistake in my understanding of this concept...

  • 1
    Can you please edit your answer and explain what your special ssh call does, and why you advice to do it? – Fabien Jun 14 '13 at 12:12
  • 3
    @Fabien He tells rsync to set two ssh options (rsync uses ssh to connect). The second one tells ssh to not prompt for confirmation if the host he's connecting to isn't already known (by existing in the "known hosts" file). The first one tells ssh to not use the default known hosts file (which would be ~/.ssh/known_hosts). He uses /dev/null instead, which is of course always empty, and as ssh would then not find the host in there, it would normally prompt for confirmation, hence option two. Upon connecting, ssh writes the now known host to /dev/null, effectively forgetting it instantly :) – DanielSmedegaardBuus Dec 07 '14 at 00:12
  • 2
    ...but you were probably wondering what effect, if any, it has on the rsync operation itself. The answer is none. It only serves to not have the host you're connecting to added to your SSH known hosts file. Perhaps he's a sysadmin often connecting to a great number of new servers, temporary systems or whatnot. I don't know :) – DanielSmedegaardBuus Dec 07 '14 at 00:23
  • 5
    "use --progress option to copy only those files which are not copied" What? – moi May 10 '16 at 13:49
  • 2
    There are a couple errors here; one is very serious: --delete will delete files in the destination that don't exist in the source. The less serious one is that --progress doesn't modify how things are copied; it just gives you a progress report on each file as it copies. (I fixed the serious error; replaced it with --remove-source-files.) – Paul d'Aoust Nov 17 '16 at 22:39
0

For those using the GUI Grsync, the matching configuration is the following:

In the "Advanced options" tab, check (at least) the checkbox "Keep partially transferred files".

Then in the "Additionnal options" field, enter:

--append-verify

Then File -> Simulation : to check whether the transfer will work (if you encounter error you might want to check the other options selected, source & destination).
And finally File -> Execute.

With this configuration if the transfer fails, you can just close the transfer window and File -> Execute again. It will resume the transfer where it was interrupted.

0

You are copying between two local filesystems. This disables almost all the optimisations offered by rsync and implicitly enables --wholefile. Furthermore you have involved NTFS filesystems that don't necessarily keep enough of the metadata that rsync uses to check a file has been correctly copied.

Try this,

sudo rsync -rtv /home/path/folder1/ /home/path/folder2
Chris Davies
  • 116,213
  • 16
  • 160
  • 287