Unfortunately, to manipulate the content of a binary file, dd
is pretty much the only tool in POSIX. Although most modern implementations of text processing tools (cat
, sed
, awk
, …) can manipulate binary files, this is not required by POSIX: some older implementations do choke on null bytes, input not terminated by a newline, or invalid byte sequences in the ambient character encoding.
It is possible, but difficult, to use dd
safely. The reason I spend a lot of energy steering people away from it is that there's a lot of advice out there that promotes dd
in situations where it is neither useful nor safe.
The problem with dd
is its notion of blocks: it assumes that a call to read
returns one block; if read
returns less data, you get a partial block, which throws things like skip
and count
off. Here's an example that illustrates the problem, where dd
is reading from a pipe that delivers data relatively slowly:
yes hello | while read line; do echo $line; done | dd ibs=4 count=1000 | wc -c
On a bog-standard Linux (Debian jessie, Linux kernel 3.16, dd
from GNU coreutils 8.23), I get a highly variable number of bytes, ranging from about 3000 to almost 4000. Change the input block size to a divisor of 6, and the output is consistently 4000 bytes as one would naively expect — the input to dd
arrives in bursts of 6 bytes, and as long as a block doesn't span multiple bursts, dd
gets to read a complete block.
This suggests a solution: use an input block size of 1. No matter how the input is produced, there's no way for dd
to read a partial block if the input block size is 1. (This is not completely obvious: dd
could read a block of size 0 if it's interrupted by a signal — but if it's interrupted by a signal, the read
system call returns -1. A read
returning 0 is only possible if the file is opened in non-blocking mode, and in that case a read
had better not be considered to have been performed at all. In blocking mode, read
only returns 0 at the end of the file.)
dd ibs=1 count="$number_of_bytes"
The problem with this approach is that it can be slow (but not shockingly slow: only about 4 times slower than head -c
in my quick benchmark).
POSIX defines other tools that read binary data and convert it to a text format: uuencode
(outputs in historical uuencode format or in Base64), od
(outputs an octal or hexadecimal dump). Neither is well-suited to the task at hand. uuencode
can be undone by uudecode
, but counting bytes in the output is awkward because the number of bytes per line of output is not standardized. It's possible to get well-defined output from od
, but unfortunately there's no POSIX tool to go the other way round (it can be done but only through slow loops in sh or awk, which defeats the purpose here).
dd
is portable. If you don't mind the warning, or adjust your blocksize, is there a problem with usingdd
? – muru Apr 22 '16 at 21:35head -c
not being portable. – Guido Apr 22 '16 at 22:30dd
is. – muru Apr 22 '16 at 22:30dd
doesn't do the job, for reasons explained in the linked answer. In my experiments, requesting 10 * 2^20 bytes withdd
yields less than 200 bytes. If you don't understand or believe that, I urge you to read the linked answer which clearly explains how it can be so. – Low Powah Apr 22 '16 at 22:55bs=
parameter, but that doesn't preventdd
from returning before reading the requested number of bytes (n = bs * count). – Low Powah Apr 22 '16 at 23:04bs
? – muru Apr 22 '16 at 23:05dd bs=1000000 count=10 if=/dev/random of=/tmp/random
results in a file containing less than 200 bytes. Now do you understand whydd
isn't the right tool for the job? – Low Powah Apr 22 '16 at 23:08bs
causes problems, why aren't you using a lowerbs
? Why notdd bs=1000 count=10000
? Is something forcing you to use thatbs
? – muru Apr 22 '16 at 23:11dd
is to either use abs
of 1 byte (asread()
will return at least 1 byte) or to not usebs=
and instead useobs=
(and, optionally,ibs=
) separately and pipe it into anotherdd
with your count and anibs=
set to theobs=
of the first. If you usebs=
at alldd
will write partial reads without buffering them to a known size. Using(i)bs=1000 count=10000
only guarantees 10k writes of up to 1000 bytes and will happily write out less than 10k * 1000 bytes if any of the reads return less. – Adrian Günter Apr 16 '18 at 06:43bs
of 1... – muru Apr 16 '18 at 06:47dd if=/dev/zero of=/dev/null bs=1 count=10000000
takes far longer than with larger block sizes. It's simply not practical for many/most situations. Piping to anotherdd
works and allows arbitrarily large reads and writes. – Adrian Günter Apr 16 '18 at 06:52dd if=/dev/random | dd count=128 | wc -c
will reliably write 64KiB on systems wheredd
's default blocksize is 512 bytes. The blocksize can be adjusted by settingobs=
on the firstdd
andibs=
(or justbs=
) on the second to the same value:dd if=/dev/random obs=4K | dd bs=4K count=16 | wc -c
also writes 64KiB. The key is to never set thebs=
value on the firstdd
as this will ensure full output blocks are accumulated before writes. On some implementations you need to setibs=
of first to a value other thanobs=
:dd if=... ibs=1K obs=4K | dd bs=4K ...
– Adrian Günter Apr 16 '18 at 15:30dd if=/dev/random of=/dev/null obs=1317
and let it run for 30 seconds or so on a system that isn't entropy starved, then kill it withCtrl-c
. If you read the status output as[<full_blocks>+<partial_blocks>] records (in|out)
you will see thatdd
read in many (or entirely) partial blocks – many more blocks than it wrote – and that every output block it wrote was a full block, i.e., 1317 bytes. You can verify this withdd if=/dev/random obs=1317 | pv -bn >/dev/null
;pv
will report bytes read in multiples of 1,317. – Adrian Günter Apr 16 '18 at 16:41