23

Some file copying programs like rsync and curl have the ability to resume failed transfers/copies.

Noting that there can be many causes of these failures, in some cases the program can do "cleanup" some cases the program can't.

When these programs resume, they seem to just calculate the size of the file/data that was transferred successfully and just start reading the next byte from the source and appending on to the file fragment.

e.g the size of the file fragment that "made it" to the destination is 1378 bytes, so they just start reading from byte 1379 on the original and adding to the fragment.

My question is, knowing that bytes are made up of bits and not all files have their data segmented in clean byte sized chunks, how do these programs know they the point they have chosen to start adding data to is correct?

When writing the destination file is some kind of buffering or "transactions" similar to SQL databases occurring, either at the program, kernel or filesystem level to ensure that only clean, well formed bytes make it to the underlying block device?
Or do the programs assume the latest byte would be potentially incomplete, so they delete it on the assumption its bad, recopy the byte and start the appending from there?

knowing that not all data is represented as bytes, these guesses seem incorrect.

When these programs "resume" how do they know they are starting at the right place?

  • 21
    "not all files have their data segmented in clean byte sized chunks" they don't? How do you write anything less than a byte to a file? – muru Feb 06 '18 at 06:22
  • 1
    @muru - so you are saying the minimum amount of data that can be written to a file is always a byte? – the_velour_fog Feb 06 '18 at 06:25
  • 17
    I know of no system calls that can write anything less than a byte, and as for the disk itself, I think no disks today write less than 512 byte blocks (or 4096 byte blocks). – muru Feb 06 '18 at 06:27
  • @muru so are you saying that a program or the kernel always writes in complete - one byte chunks ? kind of like a SQL transaction? That would mean the kernel is repeating millions or billions of these "transactions" to transfer a large file. So yes, that would mean you could basically trust the bytes that "made it" were good. But I would have thought a typical fopen(), fwrite() operation would give access to the file system and the process could just stream bits into it. – the_velour_fog Feb 06 '18 at 06:41
  • 8
    No, I'm saying the minimum is a byte. Sane applications would be using 4KB or 8KB chunks: head -c 20480 /dev/zero | strace -e write tee foo >/dev/null, and then the OS will buffer them up and send them to the disk in even larger chunks. – muru Feb 06 '18 at 06:46
  • 9
    @the_velour_fog: How do you you write just one bit with fwrite()? – psmears Feb 06 '18 at 10:41
  • 1
    Likewise, when transferring data over a network connection, you're always transferring full bytes. – jcaron Feb 06 '18 at 10:58
  • 9
    For all practical purposes, data is made up of bytes and everything operates with them as the smallest unit. Some systems (mostly relating to compression eg gzip, h264) unpack individual bits out of the bytes, but the operating system and memory operation is at the level of bytes. – pjc50 Feb 06 '18 at 13:58
  • 4
    @the_velour_fog that is correct — the minimum amount of data that can be read or written to a block device is a byte. To change a bit, one must re-write the entire byte containing it. This is true across all OSes I am aware of – Josh Feb 06 '18 at 15:14
  • 3
    You may find it useful for directing your thinking to consider that the original meaning of the word "byte" was basically "smallest chunk of data that the computer can address/manipulate individually" (which wasn't/isn't always an octet (eight bits). Looking at it from this angle, it might make more sense why nothing can truly write/change individual bits, but only at-least-byte-sized chunks. (Operations that seem to change a single bit are actually using bitwise operations on bytes/words, producing a byte/word as a result). – mtraceur Feb 06 '18 at 19:11
  • 1
    @jcaron: Over a network, you're probably transferring much more than single bytes. For a largeish file transfer, the ethernet packet size is ~1500 bytes, IIRC. So the receiving end acknowledges each successfully received packet... – jamesqf Feb 06 '18 at 19:53

5 Answers5

40

For clarity's sake - the real mechanics is more complicated to give even better security - you can imagine the write-to-disk operation like this:

  • application writes bytes (1)
  • the kernel (and/or the file system IOSS) buffers them
  • once the buffer is full, it gets flushed to the file system:
    • the block is allocated (2)
    • the block is written (3)
    • the file and block information is updated (4)

If the process gets interrupted at (1), you don't get anything on the disk, the file is intact and truncated at the previous block. You sent 5000 bytes, only 4096 are on the disk, you restart transfer at offset 4096.

If at (2), nothing happens except in memory. Same as (1). If at (3), the data is written but nobody remembers about it. You sent 9000 bytes, 4096 got written, 4096 got written and lost, the rest just got lost. Transfer resumes at offset 4096.

If at (4), the data should now have been committed on disk. The next bytes in the stream may be lost. You sent 9000 bytes, 8192 get written, the rest is lost, transfer resumes at offset 8192.

This is a simplified take. For example, each "logical" write in stages 3-4 is not "atomic", but gives rise to another sequence (let's number it #5) whereby the block, subdivided into sub-blocks suitable for the destination device (e.g. hard disk) is sent to the device's host controller, which also has a caching mechanism, and finally stored on the magnetic platter. This sub-sequence is not always completely under the system's control, so having sent data to the hard disk is not a guarantee that it has been actually written and will be readable back.

Several file systems implement journaling, to make sure that the most vulnerable point, (4), is not actually vulnerable, by writing meta-data in, you guessed it, transactions that will work consistently whatever happens in stage (5).

If the system gets reset in the middle of a transaction, it can resume its way to the nearest intact checkpoint. Data written is still lost, same as case (1), but resumption will take care of that. No information actually gets lost.

LSerni
  • 4,560
  • 1
    Great explanation. that all makes alot of sense. so if a process does make it all the way to (4) file block info updated, you know all those bytes are good. then any bytes that were at any previous stage either not have made it to disk or - if they did - they would be "un-remembered" (no references to them) – the_velour_fog Feb 06 '18 at 07:17
  • 4
    @the_velour_fog And just to complement the penultimate paragraph - if you're using a file system that doesn't implement journaling, you can indeed get "broken" data, causing the resume to fail and produce a garbled file without giving you an error. This used to happen all the time in the past, especially with file-systems designed for high-latency devices (like floppies). There were still some tricks to avoid this even if the file-system wasn't reliable in this way, but it needed a smarter application to compensate and some assumptions that may have been wrong on some systems. – Luaan Feb 06 '18 at 16:06
  • This answer overstates usefulness of journaling in file systems. It does not work reliably unless everything implements transactional semantics, including userspace applications (via fsync) and hard drive controller (often broken, even in supposedly "enterprise" drives). Without fsync many file operations, that are intuitively ordered and atomic are not guaranteed to be such by POSIX: files, opened with O_APPEND might behave differently from ones without etc. In practice the most important keys to file consistency are kernel VFS system and disk cache. Everything else is mostly fluff. – user1643723 Feb 07 '18 at 10:38
10

Note: I have not looked at the sources of rsync or any other file transfer utility.

It is trivial to write a C program that jumps the the end of a file and gets the position of that location in bytes.

Both operations is done with a single call to the standard C library function lseek() (lseek(fd, 0, SEEK_END) returns the length of the file opened for file descriptor fd, measured in bytes).

Once that is done for the target file, a similar call to lseek() may be done on the source file to jump to the appropriate position: lseek(fd, pos, SEEK_SET). The transfer may then continue at that point, assuming the earlier portion of the source file has been identified as unchanged (different utilities may do this in different ways).

A file may be fragmented on the disk, but the filesystem will ensure that an application perceives the file as a sequential sequence of bytes.


Regarding the discussion in comments about bits and bytes: The smallest unit of data that may be written to disk is a byte. A single byte requires at least one block of data to be allocated on disk. The size of a block is dependent on the type of filesystem and possibly also on the parameters used by the administrator when initializing the filesystem, but it's usually somewhere between 512 bytes and 4 KiB. Write operations may be buffered by the kernel, the underlying C library or by the application itself and the actual writing to disk may happen in multiples of the appropriate block size as an optimization.

It is not possible to write single bits to a file and if a write operation fails, it will not leave "half-written bytes" in the file.

Kusalananda
  • 333,661
  • thanks, so what is it that ensures if a write operation fails - it will not leave half written bytes? is it the kernel buffering muru was describing? - i.e. if a process is interrupted in the middle of sending a 8KB chunk to the kernel and was terminated unexpectedly - that 8KB chunk would never reach the kernel - but any previous ones that reached the kernel and filesystem could be assumed to be good? – the_velour_fog Feb 06 '18 at 06:58
  • 6
    @the_velour_fog that sort of unexpected termination cannot happen, because the process would be uninterruptible in the middle of an I/O system call (that's why it's not unusual to see unkillable process stuck on filesystem access calls for an NFS file). Also see: https://unix.stackexchange.com/q/62697/70524 – muru Feb 06 '18 at 07:02
  • 2
    There may be issues if the system loses power at exactly the wrong time. This can occasionally result in garbage at the last write point of a file. It's a very tricky problem in database design. But still the normal smallest unit that is either "valid" or "invalid" is a disk block. – pjc50 Feb 06 '18 at 13:55
  • 1
    @the_velour_fog It's not so much as you can't get "half written bytes" (or, more accurately, a half-written block of bytes) as a half-written block wouldn't be recorded as having been written (in its entirety) -- see steps (3) and (4) of LSerni's answer. – TripeHound Feb 07 '18 at 16:21
5

This are basically two questions, because programs like curl and rsync are very different.

For HTTP clients like curl they check the size of the current file and then send a Content-Range header with their request. The server either resumes sending the range of the file using status code 206 (partial content) instead of 200 (success) and the download is resumed or it ignores the header and starts from the beginning and the HTTP client has no other choice than re-download everything again.

Further the server may or may not send a Content-Length header. You may have noticed that some downloads are not showing a percentage and filesize. These are downloads where the server does not tell the client the length, so the client only knows the amount it downloaded but not how many bytes will follow.

Using a Content-Range header with start and stop position is used by some download manager to download a file from different sources at once, which speeds up the transfer if each mirror by itself is slower than your network connection.

rsync on the other hand is an advanced protocol for incremental file transfers. It generates checksums of parts of the file on the server and client side to detect which bytes are the same. Then it only sends the differences. This means it cannot only resume a download, but it even can download the changed bytes if you changed a few bytes in the middle of a very large file without re-downloading the file.

Another protocol made for resuming transfers is bittorrent, where the .torrent file contains a list of checksums for blocks from the file, so blocks can be downloaded and verified in arbitrary order and in parallel from different sources.

Note that rsync and bittorent will verify the partial data on your disk, while resuming a HTTP download will not. So if you suspect the partial data to be corrupted you need to check the integrity otherwise, i.e. using a checksum of the final file. But just interrupting the download or losing the network connection usually does not corrupt the partial file while a power failure during the transfer may do.

allo
  • 946
  • 1
  • 7
  • 14
4

TL;DR: They can't, unless the protocol that they use allows for it.

Programs can't always resume from an arbitrary location: for example, HTTP requests are only restartable if the server supports it and the client implements it: this is not universal, so check your program's documentation. If the server does support it, programs can resume transfer by simply asking as part of the protocol. You'll usually see partial transfers in your download directory (they're commonly marked with a ".partial" extension or something similar.)

If a file download is paused or otherwise halted, the client can write the file to disk and have a definite idea of where to resume. If, on the other hand, the client crashes or there's an error writing to the file, the client has to assume that the file is corrupted and start over. BitTorrent somewhat mitigates this by breaking up the files into "chunks" and keeping track of which ones have been downloaded successfully; the most that it will ever have to redo is a few chunks. Rsync does something similar.

How do programs know that the content is the same? One method is to verify that some identifier is the same between the client and server. Some examples of this would be the timestamp and size, but there are mechanisms that can be specific to a protocol. If the identifiers match, then the client can assume that resuming will work.

If you want more definite verification, HTTP and friends should not be your first choice. You will want to use a protocol that also has a checksum or hash for the entire file and each tranferred chunk so that you can compare the download's checksum to the server's computer checksum: anything that doesn't match will then be re-downloaded. Again, BitTorrent is an example of this kind of protocol; rsync can optionally do this too.

ErikF
  • 4,042
  • for the rsync example, it's going to be straightforward because there is only one rsync protocol. for http downloads, there is range-requesting as a standard. i'm curious to know what curl actually does on resume-upload, because the standard semantics of upload is multipart/form-data (for wget and curl), but I don't believe that upload resume semantics are universally agreed upon. YouTube and Nginx may do this differently for instance. – Rob Feb 06 '18 at 08:47
1

Depends on the protocol used to transfer. But curl uses http and it transfers data sequentially in the order in which it appears in the file. So curl can resume based on the file size of a partially-completed transfer. In fact, you can trick it to skip first N bytes by creating a file of length N (of anything) and asking it to treat that file as a partially completed download (and then discarding the first N bytes).