4

While I was reading this answer, the author used this command to put the result of a heredoc to a variable:

read -r -d '' VAR <<'EOF'
abc'asdf"
$(dont-execute-this)
foo"bar"''
EOF

I'm a little confused about the -d option. From the help text for the read command:

-d delim
continue until the first character of DELIM is read, rather than newline

So if I pass an empty string to -d, it means read until the first empty string. What does it mean? The author commented under the answer that -d '' means using the NUL string as the delimiter. Is this true (empty string means NUL string)? Why not use something like -d '\0' or -d '\x0' etc.?

Kusalananda
  • 333,661

2 Answers2

10

Mostly, it means what it says, e.g.:

$ read -d . var; echo; echo "read: '$var'"
foo.
read: 'foo'

The reading ends immediately at the ., I didn't hit enter there.

But read -d '' is a bit of a special case, the online reference manual says:

-d delim
The first character of delim is used to terminate the input line, rather than newline. If delim is the empty string, read will terminate a line when it reads a NUL character.

\0 means the NUL byte in printf, so we have e.g.:

$ printf 'foo\0bar\0' | while read -d '' var; do echo "read: '$var'"; done
read: 'foo'
read: 'bar'

In your example, read -d '' is used to prevent the newline from being the delimiter, allowing it to read the multiline string in one go, instead of a line at a time.


I think some older versions of the documentation didn't explicitly mention -d ''. The behaviour may originally be an unintended coincidence from how Bash stores strings in the C way, with that trailing NUL byte. The string foo is stored as foo\0, and the empty string is stored as just \0. So, if the implementation isn't careful to guard against it and only picks the first byte in memory, it'll see \0, NUL, as the first byte of an empty string.

Re-reading the question more closely, you mentioned:

The author commented under the answer that -d '' means using the NUL string as delimiter.

That's not exactly right. The null string (in the POSIX parlance) means the empty string, a string that contains nothing, of length zero. That's not the same as the NUL byte, which is a single byte with binary value zero(*). If you used the empty string as a delimiter, you'd find it practically everywhere, at every possible position. I don't think that's possible in the shell, but e.g. in Perl it's possible to split a string like that, e.g.:

$ perl -le 'print join ":", split "", "foobar";'
f:o:o:b:a:r

read -d '' uses the NUL byte as the separator.

(*not the same as the character 0, of course.)

Why not use something like -d '\0' or -d '\x0' etc.?

Well, that's a good question. As Stéphane commented, originally, ksh93's read -d didn't support read -d '' like that, and changing it to support backslash escapes would have been incompatible with the original. But you can still use read -d $'\0' (and similarly $'\t' for the tab, etc.) if you like it better. Just that behind the scenes, that's the same as -d '', since Bash doesn't support the NUL byte in strings. Zsh does, but it seems to accept both -d '' and -d $'\0'.

ilkkachu
  • 138,973
1

Just to point out the persnickityness of ascii 0 as a character in files. Expect (my favorite tool!) has to make special provisions for reading/matching nulls.

  • Welcome to the site. If you want to comment on another answer, please don't post this as an answer - the answer section is only intended for definitive solutions to the original problem. Once you have enough reputation, you will be able to comment on other people's posts. In the meantime, consider submitting an edit suggestion if you think that you can improve the other answer. – AdminBee Sep 27 '22 at 11:42