In How do I bring HEREDOC text into a shell script variable? someone reports a problem using a here document with a quoted delimiter word inside $(...)
command substitution, where a backslash \
at the end of a line inside the document triggers newline-joining line continuation, while the same here document outside command substitution works as expected.
Here is a simplified example document:
cat <<'EOT'
abc ` def
ghi \
jkl
EOT
This includes one backtick and one backslash at the end of a line. The delimiter is quoted, so no expansions occur inside the body. In all Bourne-alikes I can find this outputs the contents verbatim. If I put the same document inside a command substitution as follows:
x=$(cat <<'EOT'
abc ` def
ghi \
jkl
EOT
)
echo "$x"
then they no longer behave identically:
dash
,ash
,zsh
,ksh93
, BusyBoxash
,mksh
, and SunOS 5.10 POSIXsh
all give the verbatim contents of the document, as before.- Bash 3.2 gives a syntax error for an unmatched backtick. With matched backticks, it attempts to run the contents as a command.
- Bash 4.3 collapses "ghi" and "jkl" onto a single line, but has no error. The
--posix
option does not affect this. Kusalananda tells me (thanks!) thatpdksh
behaves the same way.
In the original question, I said this was a bug in Bash's parser. Is it? [Update: yes] The relevant text from POSIX (all from the Shell Command Language definition) that I can find is:
- §2.6.3 Command Substitution:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results.
- §2.7.4 Here-Document:
If any part of word is quoted, the delimiter shall be formed by performing quote removal on word, and the here-document lines shall not be expanded.
- §2.2.1 Escape Character (Backslash):
If a <newline> follows the <backslash>, the shell shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into tokens.
- §2.3 Token Recognition:
When an io_here token has been recognized by the grammar (see Shell Grammar), one or more of the subsequent lines immediately following the next NEWLINE token form the body of one or more here-documents and shall be parsed according to the rules of Here-Document.
When it is not processing an io_here, the shell shall break its input into tokens by applying the first applicable rule below to the next character in its input. ...
...
- If the current character is <backslash>, single-quote, or double-quote and it is not quoted, it shall affect quoting for subsequent characters up to the end of the quoted text. The rules for quoting are as described in Quoting . During token recognition no substitutions shall be actually performed, and the result token shall contain exactly the characters that appear in the input (except for <newline> joining), unmodified, including any embedded or enclosing quotes or substitution operators, between the and the end of the quoted text.
My interpretation of this is that all characters after $(
until the terminating )
comprise the shell script, verbatim; a here document appears, so here-document processing occurs instead of ordinary tokenisation; the here document then has a quoted delimiter, meaning that its contents is processed verbatim; and the escape character never comes into it. I can see an argument, however, that this case is simply not addressed, and both behaviours are permissible. It's possible that I've skipped over some relevant text somewhere, too.
- Is this situation made clearer elsewhere?
- What should a portable script be able to rely on (in theory)?
- Is the specific treatment given by of any of these shells (Bash 3.2/Bash 4.3/everyone else) mandated by the standard? Forbidden? Permitted?
echo "$x"
, but any way of inspecting the variable works. I've edited that line into the bottom. – Michael Homer Jan 29 '17 at 06:13pdksh
behaves likebash
4.4 and 4.3. – Kusalananda Jan 29 '17 at 08:17$(...)
with whatever that output is... Now, when running the command in your example in a subshell (inbash
) it does output the expected result. It's only when turning it into command substitution that it collapses "ghi" and "jkl". So this is a bug imo – don_crissti Feb 03 '17 at 00:30mksh
is now widely consideredpdksh
's future (OpenBSDksh
being another one, but the MirBSDksh
author also maintains the Debian package bringing it to a much wider audience and mksh and oksh generally feed each other). Good catch for thebash
bug btw. – Stéphane Chazelas Feb 24 '17 at 21:28