7

I have a little open source project that for various reasons I've tried to write in reasonably portable shell script. Its automated integration tests check that hostile characters in path expressions are treated properly, among other things.

Users with /bin/sh provided by bash are seeing a failure in a test that I've simplified down to the following:

echo "A bug\\'s life"
echo "A bug\\\\'s life"

On bash, it produces this expected result:

A bug\'s life
A bug\\'s life

With dash, which I've developed against, it does this:

A bug\'s life
A bug\'s life

I'd like to think that I haven't found a bug in dash, that I might be missing something instead. Is there a rational explanation for this?

csirac2
  • 171
  • 4
    see https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo – Sundeep Nov 06 '16 at 12:42
  • 2
    That's quite helpful. Although it doesn't directly cover the \\ sequence, it explains why things might be the way they are in dash, and points to the solution: use printf, which is more portable in its treatment of escape sequences. Thanks – csirac2 Nov 06 '16 at 13:19
  • 1
    It should rather be Why bash output is different from bash's?. It's the dash version that is standard here. – Stéphane Chazelas Nov 06 '16 at 21:43

2 Answers2

12

In

echo "A bug\\'s life"

Because those are double quotes, and \ is special inside double quotes, the first \ is understood by the shell as escaping/quoting the second \. So a A bug\'s life argument is being passed to echo.

echo "A bug\'s life"

Would have achieved exactly the same. ' being not special inside double quotes, the \ is not removed so it's the exact same argument that is passed to echo.

As explained at Why is printf better than echo?, there's a lot of variation between echo implementations.

In UNIX-conformant implementations like dash's echo builtin¹, \ is used to introduce escape sequences: \n for newline, \b for backspace, \0123 for octal sequences... and \\ for backslash itself.

Some (non-POSIX) ones require a -e option for that, or do it only when in conformance mode (like bash's when built with the right options like for the sh of OS/X or when called with SHELLOPTS=xpg_echo in the environment).

So in standard (Unix standard only; POSIX leaves the behaviour unspecified) echos,

echo '\\'

same as:

echo "\\\\"

outputs one backslash, while in bash when not in conformance mode:

echo '\\'

will output two backslashes.

Best it to avoid echo and use printf instead:

$ printf '%s\n' "A bug\'s life"
A bug\'s life

Which works the same in this instance in all printf implementations.


¹ dash's echo is compliant in that regard, but not in that echo -n output nothing while the UNIX specification (POSIX + XSI) requires it to output -n<newline>.

1

The issue for echo and printf is only related to understanding when a back quoted character is an "special character".

The most simple is with an string in printf '%s' "$string".
In this case there are no special characters to process and everything that the command printf receive in the second argument is printed as-is.

Note that only single quotes are used:

$ printf '%s\n' '\\\\\\\\\T '       # nine \
\\\\\\\\\T                          # nine \

When the string is used as the first argument, some characters are special.
A \\ pair represents a single \ and a \T a single T:

$ printf '\\\\\\\\\T '              # nine \
\\\\T                               # four \

Each of four pairs of \\ transformed to a single \ and the last \T to a T.

$ printf '\\\\\\\\\a '              # nine \
\\\\                                # four \

Each of four pairs of \\ transformed to a single \ and the last \a to a bell (BEL) character (not printable).

The same happens with some implementations of echo.

The dash implementation always transform special backslash characters.

If we place this code in a script:

set -- '\g ' '\\g ' '\\\g ' '\\\\g ' '\\\\\g ' '\\\\\\g ' '\\\\\\\g ' '\\\\\\\\g ' '\\\\\\\\\g '
for i ; do
    printf '<%-14s> \t<%-9s> \t<%-14s> \t<%-12s>\n' \
       "$(printf '%s ' "|$i|")" \
       "$(printf       "|$i|")" \
           "$(echo         "|$i|")" \
       "$(echo    -e   "|$i|")" ;
done

Then, dash will print (dash ./script):

<|\g |         >        <|\g |    >     <|\g |         >        <-e |\g |    >
<|\\g |        >        <|\g |    >     <|\g |         >        <-e |\g |    >
<|\\\g |       >        <|\\g |   >     <|\\g |        >        <-e |\\g |   >
<|\\\\g |      >        <|\\g |   >     <|\\g |        >        <-e |\\g |   >
<|\\\\\g |     >        <|\\\g |  >     <|\\\g |       >        <-e |\\\g |  >
<|\\\\\\g |    >        <|\\\g |  >     <|\\\g |       >        <-e |\\\g |  >
<|\\\\\\\g |   >        <|\\\\g | >     <|\\\\g |      >        <-e |\\\\g | >
<|\\\\\\\\g |  >        <|\\\\g | >     <|\\\\g |      >        <-e |\\\\g | >
<|\\\\\\\\\g | >        <|\\\\\g |>     <|\\\\\g |     >        <-e |\\\\\g |>

The first two columns will be the same (printf) for all shells.
The other two will change with the specific implementation of echo used.

For example: ash ./script (busybox ash):

<|\g |         >        <|\g |    >     <|\g |         >        <|\g |       >
<|\\g |        >        <|\g |    >     <|\\g |        >        <|\g |       >
<|\\\g |       >        <|\\g |   >     <|\\\g |       >        <|\\g |      >
<|\\\\g |      >        <|\\g |   >     <|\\\\g |      >        <|\\g |      >
<|\\\\\g |     >        <|\\\g |  >     <|\\\\\g |     >        <|\\\g |     >
<|\\\\\\g |    >        <|\\\g |  >     <|\\\\\\g |    >        <|\\\g |     >
<|\\\\\\\g |   >        <|\\\\g | >     <|\\\\\\\g |   >        <|\\\\g |    >
<|\\\\\\\\g |  >        <|\\\\g | >     <|\\\\\\\\g |  >        <|\\\\g |    >
<|\\\\\\\\\g | >        <|\\\\\g |>     <|\\\\\\\\\g | >        <|\\\\\g |   >

If the character used is an a, for dash:

<|\a |         >        <| |     >      <| |          >         <-e | |     >
<|\\a |        >        <|\a |    >     <|\a |         >        <-e |\a |    >
<|\\\a |       >        <|\ |    >      <|\ |         >         <-e |\ |    >
<|\\\\a |      >        <|\\a |   >     <|\\a |        >        <-e |\\a |   >
<|\\\\\a |     >        <|\\ |   >      <|\\ |        >         <-e |\\ |   >
<|\\\\\\a |    >        <|\\\a |  >     <|\\\a |       >        <-e |\\\a |  >
<|\\\\\\\a |   >        <|\\\ |  >      <|\\\ |       >         <-e |\\\ |  >
<|\\\\\\\\a |  >        <|\\\\a | >     <|\\\\a |      >        <-e |\\\\a | >
<|\\\\\\\\\a | >        <|\\\\ | >      <|\\\\ |      >         <-e |\\\\ | >

And for bash:

<|\a |         >        <| |     >      <|\a |         >        <| |        >
<|\\a |        >        <|\a |    >     <|\\a |        >        <|\a |       >
<|\\\a |       >        <|\ |    >      <|\\\a |       >        <|\ |       >
<|\\\\a |      >        <|\\a |   >     <|\\\\a |      >        <|\\a |      >
<|\\\\\a |     >        <|\\ |   >      <|\\\\\a |     >        <|\\ |      >
<|\\\\\\a |    >        <|\\\a |  >     <|\\\\\\a |    >        <|\\\a |     >
<|\\\\\\\a |   >        <|\\\ |  >      <|\\\\\\\a |   >        <|\\\ |     >
<|\\\\\\\\a |  >        <|\\\\a | >     <|\\\\\\\\a |  >        <|\\\\a |    >
<|\\\\\\\\\a | >        <|\\\\ | >      <|\\\\\\\\\a | >        <|\\\\ |    >

To that, we have to add the interpretation that the shell were the commands are being executed may also apply to the string of characters.

$ printf '%s\n' '\\\\T '
\\\\T
$ printf '%s\n' "\\\\T "
\\T

Note that the shell take some action on the backslash inside the double quotes.

With this code:

tab='   '
say(){ echo "$(printf '%s' "$a") $tab $(echo "$a") $tab $(echo -e "$a")"; }
a="one \a "         ; say
a="two \\a "        ; say
a="t33 \\\a "       ; say
a="f44 \\\\a "      ; say
a="f55 \\\\\a "     ; say
a="s66 \\\\\\a "    ; say
a="s77 \\\\\\\a "   ; say
a="e88 \\\\\\\\a "  ; say
a="n99 \\\\\\\\\a " ; say

Both effects get added, and we get this:

$ bash ./script
one \a           one \a          one  
two \a           two \a          two  
t33 \\a          t33 \\a         t33 \a 
f44 \\a          f44 \\a         f44 \a 
f55 \\\a         f55 \\\a        f55 \ 
s66 \\\a         s66 \\\a        s66 \ 
s77 \\\\a        s77 \\\\a       s77 \\a 
e88 \\\\a        e88 \\\\a       e88 \\a 
n99 \\\\\a       n99 \\\\\a      n99 \\ 

For dash it is even more severe:

$ dash ./script
one              one             -e one  
two              two             -e two  
t33 \a           t33             -e t33  
f44 \a           f44             -e f44  
f55 \            f55 \           -e f55 \ 
s66 \            s66 \           -e s66 \ 
s77 \\a          s77 \a          -e s77 \a 
e88 \\a          e88 \a          -e e88 \a 
n99 \\           n99 \           -e n99 \