6

Over a year back, I contributed to a tutorial on how to install NixOS on Linode.

It seems like most tutorials that deal with putting iso files on installation media always use dd, which is why "is dd still relevant these days?" doesn't answer my question. This tutorial also originally used dd and I slightly modified it to get the sha256 checksum to verify the iso while piping it from the source with tee, like this:

curl -L $iso | tee >(dd of=/dev/sda) | sha256sum

But it seems to me that dd here is somewhat redundant, and actually a lot slower than the following:

curl -L $iso | tee /dev/sda | sha256sum

While helping someone else, months back, follow the tutorial, I recall they had problems with the simpler approach, but when I just tried it this past weekend, two separate installs, it seemed to work just fine, and rather blazingly fast too.

Is this modification reliable enough to submit a pull request updating the tutorial?

Or was I just lucky getting it to work, and dd is actually safer and more reliable to use to create installation media - thus we should leave the tutorial as-is?

Aaron Hall
  • 465
  • 4
  • 19
  • 1
    I think there is also another problem here: These simple copying tools have no final checkpoint where the user can doublecheck, that the target drive is the correct one. So I suggest that you provide some 'extra' advice or warning to double-check the target device in your tutorial. – sudodus Jan 11 '22 at 08:21

1 Answers1

15

Using dd is not safer or faster or less reliable. In fact, here, it introduces two additional risks of failure. Neither risk is likely to be a problem in practice if people follow those instructions manually, but they would be significant bugs if the instructions were in an automated script.

Bug: race condition

Observe:

bash-5.0$ echo hello | tee >(sleep 1; echo done); echo next step
hello
next step
bash-5.0$ done

In bash, output process substitutions are asynchronous. When a command contains a process substitution >(…), it doesn't wait for the process substitution to finish.

So when … | tee >(dd of=/dev/sda) | sha256sum returns, there may be data that's still in transit through dd. This is very unlikely to last long enough for a human to react and type another command, but it could break a script that runs some other command like eject or mount afterwards.

Bug: missing error detection

Let's start a the nominal case where everything works.

bash-5.0$ head -c 1m </dev/zero | tee >(cat >/dev/null) | wc -c; echo $?
1048576
0

Now let's see what happens if the data writing command fails.

bash-5.0$ head -c 1m </dev/zero | tee >(false) | wc -c; echo $?
8192
0

The command has a success status because the exit status of a pipeline only depends on the right-hand side. The idea is that if you pipe a data producer into a data processor, it's the job of the data processor to detect failures. Unfortunately, this can only apply when the data format allows the data processor to detect failures, which is not the case in general, and in particular is not the case here.

Note that tee completely gave up once it failed to write to the pipe connected to false. Since false never read any data, the only data that made it through to wc -c is two PIPE_BUF (one that tee wrote to both pipes, and one that tee wrote only to the pipe to wc and failed to write to the pipe to false). Depending on the relative timing of false exiting vs tee writing to the pipes and wc consuming the data, it's possible that only one or 0 PIPE_BUF made it through.

It's possible to detect the failure of tee by setting the pipefail option. (This possibility exists in ksh, in bash and in zsh but not in plain sh.)

bash-5.0$ set -o pipefail; head -c 1m </dev/zero | tee >(false) | wc -c; echo $?
8192
141

tee failed to write to a pipe, so it died of a SIGPIPE, and the corresponding shell status is 128 + numerical value of SIGPIPE (which is 13 on Linux). Thanks to the pipefail option, this causes the pipeline as a whole to exit with the same status.

Do note that the pipeline reflects the failure of tee, and not directly the failure of the command in the process substitution. If the command in the process substitution successfully reads all the data but does not process it successfully, the error will not be detected.

bash-5.0$ head -c 1m </dev/zero | tee >(cat >/dev/null; false) | wc -c; echo $?
1048576
0

wc -c processed all the data. cat >/dev/null; false simulates a command that didn't process all of its input correctly. Nonetheless the command's status indicates a success.

What this means in your real-world example is that if there's an error at the end of the data, for example because the target device is very slightly smaller than the image, this error will not be detected (except through an error message from dd).

Simple, correct solution

set -o pipefail
curl -L $iso | tee /dev/sda | sha256sum

Or, arguably simpler:

curl -L $iso | tee >/dev/sda >(sha256sum)

Note that without pipefail, this second command will succeed if curl fails. However, this failure is guaranteed to cause a wrong checksum.

A general note on the usage of dd

It seems like most tutorials that deal with putting iso files on installation media always use dd, which is why "is dd still relevant these days?" doesn't answer my question

Well, it did, more or less. Specifically, it answered the question of whether dd serves any purpose: it doesn't. It didn't cover the specific problems in using dd this particular way, which this time aren't actually due to dd itself.

The reason most tutorials use dd is that most tutorials use dd. It's a self-perpetuating legend. People use dd because they've seen it used elsewhere, even though they don't really understand why. Its syntax is unlike every other command and so it appears to be somewhat mysterious and powerful. But in dd of=/dev/sda, all the power is in /dev/sda and none in dd. It's just a pretentious, fragile way of writing cat >/dev/sda.

dhag
  • 15,736
  • 4
  • 55
  • 65
  • In | tee >(dd of=/dev/sda) | sha256sum in bash (not zsh), that will end up waiting for dd because dd's stdout goes to sha256sum as well (not in zsh), so sha256sum which bash is waiting for won't terminate before dd terminates. – Stéphane Chazelas Jan 11 '22 at 17:10
  • 1
  • Playing with blktrace and a loop device on ubuntu 20.04, I see all reads/writes to the device are in multiple of 4k (even though blkdev -O reports 512 for both logical and hw block size), and with dd's default bs (512), that causes reads to the device when you write to it (which disappear with bs values that are multiple of 4k), so while using dd with the default bs is counter productive, using GNU's dd bs=64k iflag=fullblock (or at least bs=4k for my loop device) may help performance-wise. – Stéphane Chazelas Jan 12 '22 at 08:27