How to detect EOF on a BASH script's stdin?

Question

I have a bash function inside a script that needs to read data from stdin in fixed size blocks and send those, one at a time, to external programs for further processing. The function itself should run in a loop for as long as there is data (the input is always guaranteed to be a whole number of blocks), but it doesn't otherwise need to interpret the data, so I'd like to have a way to detect EOF on the function's stdin without consuming data in case there is still some to process.

The apparently natural way to do this would be to use the read builtin, as in:

while read -r -n 0 ; do external_program ; done

The -n option to read tells it to read only at most those many bytes instead of up to newline, but unfortunately it doesn't work with 0 bytes, which would make it an ideal test for EOF. It does work with -n 1, but then it consumes the first byte of a block, which has to be 'replayed' into the stream going into the external program.

So, is there a better way, preferably using only bash builtins?

Possibly relating: https://unix.stackexchange.com/q/33049/315749 — fra-san, May 04 '21 at 19:03

ilkkachu · Answer 1 · 2021-05-04T18:30:37.050

I'm not sure you can detect EOF without actually trying to read some non-zero number of bytes.

That's because, well, there's no return value from the read() system call that explicitly means end-of-file. Instead, all you get is "zero bytes read, no error", and it's up to the application code to know what that means. On a regular file, that obviously happens when you're reading at or past the end of the file, when there's no data left.

But on a terminal it can happen because the user hit ^D on an empty line, causing the terminal interface to return what it has at that point, i.e. nothing; and on a datagram socket, it's possible to send and receive zero-length messages. Neither one of those cases signals an actual end: the terminal can be read for data after a ^D, and a socket might receive other messages after zero-length one. (And even on a regular file, a subsequent might return data -- if some other process appended to the file in the meanwhile. Repeating reads off EOF is what a simple implementation of tail -f would do.)

And if you explicitly ask to read zero bytes, you also get zero bytes (or an error), regardless of if you're at EOF or not.

Probably the best result could be had if the external program was able to deal with an EOF without too much fuss, preferably just return an exit code signalling that. Then you'd do:

while external_program; do
    # do we need to do anything here but loop?
    true 
done

or, if we're so lucky that we can get a different exit status for EOF:

while true; do
    external_program
    ret=$?
    if [ "$ret" = 0 ]; then
        echo "ok, continue"
    elif [ "$ret" = 1 ]; then
        echo "deal with this error"
        # but what now?
    elif [ "$ret" = 2 ]; then
        echo "got EOF, stopping"
        break
    fi
done

Having that program deal with EOF makes sense in that it needs to verify what ever input it gets anyway.

If you can't do that, you could have Bash read the block of data and pass it to the program if enough was actually read:

blocksize=123
while IFS= read -d '' -r -n "$blocksize" data && [ "${#data}" = "$blocksize"]; do
    printf "%s" "$data" | externalprogram
done

But that only works in Bash if the data never contains NUL bytes (\0). If it does, you'd need to switch to Zsh (or some real programming language), or use something like head -c "$blocksize" > tmpfile instead.

How to detect EOF on a BASH script's stdin?

1 Answers1