When is dd suitable for copying data? (or, when are read() and write() partial)

Question

Short version: In what circumstances is dd safe to use for copying data, safe meaning that there is no risk of corruption due to a partial read or write?

Long version — preamble: dd is often used to copy data, especially from or to a device (example). It's sometimes attributed mystical properties of being able to access devices at a lower level than other tools (when in fact it's the device file that's doing the magic) — yet dd if=/dev/sda is the same thing as cat /dev/sda. dd is sometimes thought to be faster, but cat can beat it in practice. Nonetheless, dd has unique properties that make it genuinely useful sometimes.

Problem: dd if=foo of=bar is not, in fact, the same as cat <foo >bar. On most unices¹, dd makes a single call to read(). (I find POSIX fuzzy on what constitutes “reading an input block” in dd.) If read() returns a partial result (which, according to POSIX and other reference documents, it's allowed to unless the implementation documentation says otherwise), a partial block is copied. Exactly the same issue exists for write().

Observations: In practice, I've found that dd can cope with block devices and regular files, but that may just be that I haven't exercised it much. When it comes to pipes, it's not difficult to put dd at fault; for example try this code:

yes | dd of=out bs=1024k count=10

and check the size of the out file (it's likely to be well under 10MB).

Question: In what circumstances is dd safe to use for copying data? In other words, what conditions on the block sizes, on the implementation, on the file types, etc, can ensure that dd will copy all the data?

(GNU dd has a fullblock flag to tell it to call read() or write() in a loop so as to transfer a full block. So dd iflag=fullblock is always safe. My question is about the case when these flags (which don't exist on other implementations) are not used.)

¹ _{I've checked on OpenBSD, GNU coreutils and BusyBox.}

I've never seen any Unixy system that really could read a few MiB in a single read(2)... — vonbrand, Jan 23 '13 at 16:14
When using count, the iflag=fullblock is mandatory (or, alternatively, iflag=count_bytes). There is no oflag=fullblock. — frostschutz, Mar 31 '15 at 14:49

score 31 · Answer 1 · edited Apr 13 '17 at 12:37

From the spec:

If the bs=expr operand is specified and no conversions other than sync, noerror, or notrunc are requested, the data returned from each input block shall be written as a separate output block; if the read() returns less than a full block and the sync conversion is not specified, the resulting output block shall be the same size as the input block.

So this is probably what causes your confusion. Yes, because dd is designed for blocking, by default partial read()s will be mapped 1:1 to partial write()s, or else syncd out on tail padding NUL or space chars to bs= size when conv=sync is specified.

This means that dd is safe to use for copying data (w/ no risk of corruption due to a partial read or write) in every case but one in which it is arbitrarily limited by a count= argument, because otherwise dd will happily write() its output in identically sized blocks to those in which its input was read() until it read()s completely through it. And even this caveat is only true when bs= is specified or obs= is not specified, as the very next sentence in the spec states:

If the bs=expr operand is not specified, or a conversion other than sync, noerror, or notrunc is requested, the input shall be processed and collected into full-sized output blocks until the end of the input is reached.

Without ibs= and/or obs= arguments this can't matter - because ibs and obs are both the same size by default. However, you can get explicit about input buffering by specifying different sizes for either and not specifying bs= (because it takes precedence).

For example, if you do:

IN| dd ibs=1| OUT

...then a POSIX dd will write() in chunks of 512 bytes by collecting every singly read() byte into a single output block.

Otherwise, if you do...

IN| dd obs=1kx1k| OUT

...a POSIX dd will read() at maximum 512 bytes at a time, but write() every megabyte-sized output block (kernel allowing and excepting possibly the last - because that's EOF) in full by collecting input into full-sized output blocks.

Also from the spec, though:

count=n
- Copy only n input blocks.

count= maps to i?bs= blocks, and so in order to handle an arbitrary limit on count= portably you'll need two dds. The most practical way to do it with two dds is by piping the output of one into the input of another, which surely puts us in the realm of reading/writing a special file regardless of the original input type.

An IPC pipe means that when specifying [io]bs= args that, to do so safely, you must keep such values within the system's defined PIPE_BUF limit. POSIX states that the system kernel must only guarantee atomic read()s and write()s within the limits of PIPE_BUF as defined in limits.h. POSIX guarantees that PIPE_BUF be at least ...

{_POSIX_PIPE_BUF}
- Maximum number of bytes that is guaranteed to be atomic when writing to a pipe.
- Value: 512

...(which also happens to be the default dd i/o blocksize), but the actual value is usually at least 4k. On an up-to-date linux system it is, by default, 64k.

So when you setup your dd processes you should do it on a block factor based on three values:

bs = ( obs = PIPE_BUF or lesser )
n = total desired number of bytes read
count = n / bs

Like:

yes | dd obs=1k | dd bs=1k count=10k of=/dev/null
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 0.1143 s, 91.7 MB/s

You have to synchronize i/o w/ dd to handle non-seekable inputs. In other words, make pipe-buffers explicit and they cease to be a problem. That's what dd is for. The unknown quantity here is yes's buffer size - but if you block that out to a known quantity with another dd then a little informed multiplication can make dd safe to use for copying data (w/ no risk of corruption due to a partial read or write) even when arbitrarily limiting input w/ count= w/ any arbitrary input type on any POSIX system and without missing a single byte.

Here's a snippet from the POSIX spec:

ibs=expr
- Specify the input block size, in bytes, by expr (default is 512).
obs=expr
- Specify the output block size, in bytes, by expr (default is 512).
bs=expr
- Set both input and output block sizes to expr bytes, superseding ibs= and obs=. If no conversion other than sync, noerror, and notrunc is specified, each input block shall be copied to the output as a single block without aggregating short blocks.

You'll also find some of this explained better here.

Thanks for breaking this down in detail! So dd ibs=1k obs=1k | dd bs=1k count=10k is the proper way to use dd to copy exactly 10MiB? — Rufflewind, Nov 18 '19 at 09:32

score 8 · Answer 2 · answered Jul 25 '11 at 14:29

8

With sockets, pipes, or ttys, read() and write() can transfer less than the requested size, so when using dd on these, you need the fullblock flag. With regular files and block devices however, there are only two times when they can do a short read/write: when you reach EOF, or if there is an error. This is why older implementations of dd without the fullblock flag were safe to use for disk duplication.

answered Jul 25 '11 at 14:29

psusi

17,303

Is that true of all modern unices? (I know it wasn't true of Linux at some point, possibly up to 2.0.x or 2.2.x. I rembember mke2fs failing silently because it called write() with some non-power-of-2 size (3kB IIRC) and the kernel rounded down to a power of 2.) – Gilles 'SO- stop being evil' Jul 25 '11 at 14:47
@Gilles that sounds like a different issue entirely. You always have to use a multiple of the proper block size with block devices. I am pretty sure it is true of all unicies, and it is also true for Windows. – psusi Jul 25 '11 at 14:57
Apart from tapes, the block size of a device is purely for the kernel to care about, or not. cat </dev/sda >/dev/sdb works just fine to clone a disk. – Gilles 'SO- stop being evil' Jul 25 '11 at 15:00
@Gilles that is because cat uses the appropriate block size, as OrbWeaver noted in his answer. – psusi Jul 25 '11 at 15:51
No, there is no “appropriate block size”. cat picks a buffer size for performance; it doesn't get any device-related information from the kernel. Apart from tapes, you can read() and write() to a block device with any size. On Linux at least, st_blksize depends only on the filesystem where the block device inode is located, not on the underlying device. – Gilles 'SO- stop being evil' Jul 25 '11 at 17:22
@Gilles my mistake. I was thinking O_DIRECT, which does require block size and alignment. At any rate, dd is safe to use between two block devices without the fullblock flag ;) – psusi Jul 25 '11 at 23:49

When is dd suitable for copying data? (or, when are read() and write() partial)

2 Answers2

Linked

Related