0

I have dd from GNU coreutils 8.32.

When I run { echo a; sleep 1; echo b; } | dd bs=4 count=1 then I get

a
0+1 records in
0+1 records out
2 bytes copied, 2.0381e-05 s, 98.1 kB/s

dd terminates during the sleep even though the block size was not reached and there was no EOF. The output b\n is lost. This does not happen if I remove either sleep or count=1.

In man dd I couldn't find anything that describes this behavior.

  1. Why doesn't dd count=1 wait till bs is reached or an EOF is encountered?
  2. How can I force dd to wait?
Socowi
  • 625

1 Answers1

2

This isn't caused by dd's behaviour but operating system specific behaviour. That said it is behaviour specified by posix.

The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading. For example, a read() from a file associated with a terminal may return one typed line of data.

When you set bs=4 you instruct dd to read 4 bytes at a time, but that only means it requests 4 bytes per read(). Of the OS returns less, dd won't go back and read() a second time unless...

There is the iflag=fullblock which instructs dd to perform multiple read() operations to read an entire block.

  • Thanks, that helped a bit. Even though the specification of dd does not talk about such details. That dd happens to use read() the way it is seems more like a implementation detail we shouldn't be concerned about. – Socowi Mar 15 '21 at 17:52
  • Anyways, since fullblock is not specified by posix, do you have any posix conform alternatives? I thought about dd bs=1 count=X instead of dd bs=X count=1 iflag=fullblock. In my tests this worked (albeit being far less efficient). If I understood correctly, read() blocks until there is at least one byte of input, right? – Socowi Mar 15 '21 at 17:56
  • @Socowi no it won't return 0 unless less it's a non-blocking read which it shouldn't be. I'm afraid I don't have a posix alternative. You might want to check if BSD and Busybox both support that flag. Sometimes there are non-posix flags which are very common anyway. – Philip Couling Mar 15 '21 at 17:58
  • @Socowi, and Philip, it's all about how dd behaves, and how it's specified. Right there in the POSIX text, second sentence: "It shall read the input one block at a time, using the specified input block size; it shall then process the block of data actually returned, which could be smaller than the requested block size." It's not like e.g. tail -c, which just reads a particular amount of bytes. So, unless you want that block-based behaviour, you probably shouldn't use dd. head -c isn't in POSIX, but it's rather common, and definitely exists in GNU. – ilkkachu Mar 15 '21 at 18:45
  • @ilkkachu yes I feel this is hiding something in plain sight. Many users believe that bs * count = total data transferred. If it doesn't, and the same behaviour can be observed when reading from regular files then this invalidates a lot of advice about dd's behaviour. – Philip Couling Mar 15 '21 at 19:30
  • @PhilipCouling, it doesn't mean x*y bytes, it means x blocks of y bytes. That makes a difference at least with some tape drives... See Gilles's and mikeserv's answers to the linked questions. But OTOH, you wouldn't see it on regular files, usually. You could, in principle, but I have the impression that it would really be an exception to see short read or write from something backed by a regular filesystem. Except if there's some IO error, of course, but then you have bigger problems. – ilkkachu Mar 15 '21 at 20:16
  • @ilkkachu we are agreed on what dd really does. I was pointing out there's a tonne of tutorials and very many answers here that assume it calculates total read, and corrections on that point are almost never seen... As for regular files, no that's not correct. I've run into this when performing unbuffered python reads even on small files. If the read crosses a block boundary and the next block isn't in the disk cache yet, Linux seems to return whatever it has in cache and underread. So even healthy NVME drives may do this. – Philip Couling Mar 15 '21 at 21:10
  • @ilkkachu try reading a tonne of largish files (mb+) in units of a large prime number. Use the language of your choice and see if you get underreads. – Philip Couling Mar 15 '21 at 21:13
  • @PhilipCouling yeah, well, I did start getting the tingle of having to test that when I said it. dd bs=12345701 < bigfile2 > /dev/null just gives me 76+1 in and out. Or, perl -l -e '$n = sysread STDIN, $buf, 123456761; print STDERR $n; $m = syswrite STDOUT, $buf, $n; print STDERR $m' < bigfile2 > out prints 123456761 twice. On ext4 and tmpfs. Odder filesystems might be different. I was thinking if NFS could be a problem. Esp. if the network hiccups, but I don't think I can test that now. – ilkkachu Mar 15 '21 at 21:30
  • @ilkkachu when I hit this, I was transferring about 1TB. My code errored out around 200GB because of an underread (I'd put a check in to be sure). It's not a common occurrence. But it does happen under normal circumstances. – Philip Couling Mar 15 '21 at 21:36
  • @PhilipCouling, oh hmm, what, I missed that part about python. I thought I read that comment though. What do unbuffered reads mean here? Not O_DIRECT? With that, I would not be at all surprised. But, anyway, if it happens even normally, then so be it. – ilkkachu Mar 15 '21 at 21:43
  • @ilkkachu if you don't know python then just think "read()". Default python has it's own buffer which normally hides this issue. – Philip Couling Mar 15 '21 at 22:24
  • @PhilipCouling, reads on a normal file in Linux ( pretty much all operating systems actually ) will only return what is currently in the cache if you put it in non blocking mode. It's always been that way; short reads are just not something you have to deal with unless you work with pipes or sockets. – psusi Mar 16 '21 at 13:48
  • @psusi, but the Linux man page for open(2) says of O_NONBLOCK: "Note that this flag has no effect for regular files and block devices; that is, I/O operations will (briefly) block when device activity is required, regardless of whether O_NONBLOCK is set." Would be interesting to test on other systems, though. – ilkkachu Mar 16 '21 at 15:32