I figure there is a cache somewhere that is holding up the final 5MB but I thought fsync should make sure that doesn't happen
conv=fsync
means to write back any caches by calling fsync
- after dd
has written all the data. Hanging at the end is exactly what it will do.
When the output file is slower than the input file, the data written by dd
can pile up in caches. The kernel cache can sometimes fill a significant fraction of system RAM. This makes for very misleading progress information. Your "final 5MB" was just an artefact of how dd
shows progress.
If your system was indeed caching about 8GB (i.e. half of the 16GB of written data), then I think you either must have about 32GB of RAM, or have been fiddling with certain kernel options. See the lwn.net link below. I agree that not getting any progress information for 15 minutes is pretty frustrating.
There are alternative dd
commands you could use. If you want dd
to show more accurate progress, you might have to accept more complexity. I expect the following would work without degrading your performance, though maybe reality has other ideas than I do.
gunzip -c serial2udp.image.gz |
dd iflag=fullblock bs=4M |
sudo dd iflag=fullblock oflag=direct conv=fsync status=progress bs=4M of=/dev/mmcblk0
oflag=direct iflag=fullblock
avoids piling up kernel cache, because it bypasses it altogether.
iflag=fullblock
is required in such a command AFAIK (e.g. because you are reading from a pipe and writing using direct IO). The effect of missing fullblock
is another unfortunate complexity of dd
. Some posts on this site use this to argue you should always prefer to use a different command. It's hard to find another way to do direct or sync IO though.
conv=fsync
should still be used, to write back the device cache.
- I added an extra
dd
after gunzip
, to buffer the decompressed output in parallel with the disk write. This is one of the issues that makes the performance with oflag=direct
or oflag=sync
a bit complex. Normal IO (non-direct, non-sync) is not supposed to need this, as it is already buffered by the kernel cache. You also might not need the extra buffer if you were writing to a hard drive with 4M of writeback cache, but I don't assume an SD card has that much.
You could alternatively use oflag=direct,sync
(and not need conv=fsync
). This might be useful for good progress information if you had a weird output device with hundreds of megabytes of cache. But normally I think of oflag=sync
as a potential barrier to performance.
There is a 2013 article https://lwn.net/Articles/572911/ which mentions minute-long delays like yours. Many people see this ability to cache minutes worth of writeback data as undesirable. The problem was that the limit on the cache size was applied indiscriminately, to both fast and slow devices. Note that it is non-trivial for the kernel to measure device speed, because it varies depending on the data locations. E.g. if the cached writes are scattered in random locations, a hard drive will take longer from repeatedly moving the write head.
why do the updates hang
The fsync()
is a single system call that applies to the entire range of the file device. It does not return any status updates before it is done.