2

When running the script below, I get two different outputs, depending on whether or not the shell used is sh or bash:

regex(){
     echo 's/\(.* \)\(!\{0,1\}\)has(/\1\2MOCK_has(/g'
}

replace_builtins(){ sed -e "$(regex)" }

echo 'if !has("nvim"): ' | replace_builtins

  • bash: if !MOCK_has(\"nvim\"):
  • sh: ??MOCK_has(\"nvim\"):

(those question marks were originally copied verbatim from the terminal, but disappeared when I saved the post. It's essentially non-printable characters)

I would like to know what is happening when running in POSIX sh mode that explains this phenomena.

EDIT: for bonus points explain why this also happens in Bash when substituting echo for printf in the regex function:

     printf 's/\(.* \)\(!\{0,1\}\)has(/\1\2MOCK_has(/g'
oligofren
  • 1,150
  • yash prints if !MOCK_has(\"nvim\"):, just like Bash. When I run it with dash I get MOCK_has(\"nvim\"): - no non-printable characters. What really is sh that you're using? – Arkadiusz Drabczyk Feb 24 '21 at 19:59
  • /bin/sh -> dash – oligofren Feb 24 '21 at 20:00
  • 1
    Essentially a duplicate of "Why is printf better then echo". Here \1 is unspecified behavior. Dash interprets it, Bash does not. – Quasímodo Feb 24 '21 at 20:01
  • @Quasímodo Could you give me that link? There was no printf in the actual example script. – oligofren Feb 24 '21 at 20:02
  • @Quasímodo Also, any idea how I can avoid it being interpreted by the shell? I just want to pass the generated regex to sed. – oligofren Feb 24 '21 at 20:03
  • @oligofren: I get no non-printable characters, neither on Slackware nor on Ubuntu. – Arkadiusz Drabczyk Feb 24 '21 at 20:03
  • Of course, there you go: https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo/65819 It's such a canonical Q that I even forget to link it :) – Quasímodo Feb 24 '21 at 20:04
  • @ArkadiuszDrabczyk : I will try to post a link to a runnable example – oligofren Feb 24 '21 at 20:04
  • @ArkadiuszDrabczyk I am not sure what to say: when I copied the script verbatim and pasted it into example.sh and did sh example.sh and dash example.sh I got the [][]MOCK_has(\"nvim"\)" string both times. Maybe my environment variables make the difference, but I tried setting different LANG settings (it was LANG=C.UTF-8) without any effect (tried en_US.utf8 and POSIX). Full list: https://gist.github.com/fatso83/ede210b548f41d6514dc760c8012f85d – oligofren Feb 24 '21 at 20:17
  • Yeah, I'll update the title. Sorry. I thought /bin/sh was essentially limiting any shell this was linked to some POSIX defined minimal behavior when it saw that $0 == "/bin/sh". – oligofren Feb 24 '21 at 20:22
  • 1
    Yeah, I just now saw the comment where you mentioned it was Dash indeed. Anyway, this particular thing isn't really about feature-rich vs. plain POSIX shells: Dash processes escapes by default, my Busybox sh doesn't, Zsh and Yash do, and Ksh and Bash don't. Bash does have an option for it (xpg_echo), but it's different from the usual POSIX-compatibilty option, so Bash as /bin/sh is the same as Bash as /bin/bash here. (Confused yet?) – ilkkachu Feb 24 '21 at 20:26

2 Answers2

2

The explanation is in the POSIX specification for echo:

A string to be written to standard output. If the first operand is -n, or if any of the operands contain a <backslash> character, the results are implementation-defined.

POSIX mostly codified historical practice, and sometimes historical practice was not consistent. Some shells expand escape sequences in the arguments to echo, for example \t expands to a tab and \1 expands to the character with byte value 1 (^A). Other shells treat backslash as an ordinary character.

The portable way to print an arbitrary string is with printf. printf always expands backslash escape sequences in its first argument (the format). To print a string literally, use

printf %s 's/\(.* \)\(!\{0,1\}\)has(/\1\2MOCK_has(/g'

To print a string literally and add a newline at the end, use

printf '%s\n' 's/\(.* \)\(!\{0,1\}\)has(/\1\2MOCK_has(/g'

Note that if the string is written in the shell script using a single-quoted literal, a single quote character needs to be written as '\'''. This is about the shell syntax, a completely different problem from printing a string literally.

0

You identifed an interesting problem...

The problem is missing POSIX compatibility in dash.

POSIX distincts bewteen a basic POSIX compatibility level for tiny embedded systems and larger systems (like e.g. Linux) that claim UNIX compatibility. In the latter case, the system needs to implement all so called XSI extensions.

An XSI compatible system would need to expand certain backslash escapes in echo arguments.

bash can be compiled to behave POSIX/XSI compliant (and this is e.g. done on Solaris and MacOS), but this is not done for the bash binary on Linux. If bash is compiled for POSIX/XSI compliance, it correctly handles backslash escapes for echo arguments and your example code would work with such a bash binary from Solaris or MacOS, because there is no POSIX/XSI escape sequence in your example code.

Since bash on Linux is not XSI compliant, it does not expand backlash escapes for echo arguments at all and this is why your example code works with bash on Linux as well.

dash on the other side claims POSIX/XSI compliance and expands backslash escapes for echo arguments. If dash did implement POSIX/XSI compliance correctly, your example code did work with dash as well. This is because your example code does not contain any POSIX/XSI backslash escape sequence.

POSIX/XSI requires echo to expand:

\0nnn  for an octal number that represents the related character

Your example code contains the backslash sequences:

\1 for the first sed subexpression

and

\2 for the second sed subexpression

and this is not part of the POSIX/XSI echo escape sequences, so the builtin echo from a POSIX compliant shell is not permitted to expand them. dash however incorrectly expands \1 and \2 as octal numbers even tough this is forbidden by POSIX. This is why your example code fails with dash.

I recommend you to make a bug-report against dash and either wait for a fix, or to replace echo arg by printf '%s\n' arg. This works even with dash because the known bug with the builtin printf in dash does not affect your case.

So we can list the POSIX/XSI bugs from dash as:

  • does not support multy-byte characters.

  • expands \nnn in echo arguments even though this is forbidden

  • does not expand \nnn in printf arguments even though this is required.

schily
  • 19,173