I recently had trouble with some regex on the command-line, and found that for matching a backslash, different numbers of characters can be used. This number depends on the quoting used for the regex (none, single quotes, double quotes). See the following bash session for what I mean:
echo "#ab\\cd" > file
grep -E ab\cd file
grep -E ab\\cd file
grep -E ab\\\cd file
grep -E ab\\\\cd file
#ab\cd
grep -E ab\\\\\cd file
#ab\cd
grep -E ab\\\\\\cd file
#ab\cd
grep -E ab\\\\\\\cd file
#ab\cd
grep -E ab\\\\\\\\cd file
grep -E "ab\cd" file
grep -E "ab\\cd" file
grep -E "ab\\\cd" file
#ab\cd
grep -E "ab\\\\cd" file
#ab\cd
grep -E "ab\\\\\cd" file
#ab\cd
grep -E "ab\\\\\\cd" file
#ab\cd
grep -E "ab\\\\\\\cd" file
grep -E 'ab\cd' file
grep -E 'ab\\cd' file
#ab\cd
grep -E 'ab\\\cd' file
#ab\cd
grep -E 'ab\\\\cd' file
This means that:
- with no quotes, I can match a backslash with 4-7 actual backslashes
- with double quotes, I can match a backslash with 3-6 actual backslashes
- With single quotes, I can match a backslash with 2-3 actual backslashes
I understand that one extra backslash is ignored by the shell (from the bash man page):
"A non-quoted backslash (\) is the escape character. It preserves the literal value of the next character that follows"
This does not apply to the single-quoted examples, because no escaping is done in single quotes.
And one additional backslash is ignored by the grep command ("\c" is just "c" escaped, but this is just the same as "c", because "c" does not have a special meaning in a regex).
This explains the behaviour of the example with single quotes, but I don't really understand the other two examples, especially why there is a difference between non-qouted an double-quoted strings.
Again, a quote from the bash man page:
"Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of $, `, \, and, when history expansion is enabled, !."
I tried the same with GNU awk (e.g. awk /ab\cd/{print} file
),
with the same results.
Perl, however, shows different results (using e.g. perl -ne
"/ab\\cd/"\&\&print file
):
- with no quotes, I can match a backslash with 4-5 actual backslashes
- with double quotes, I can match a backslash with 3-4 actual backslashes
- With single quotes, I can match a backslash with 2 actual backslashes
Can anyone explain that difference between non-quoted and double-qouted regex strings on the command-line for grep and awk? I'm not that interested in an explanation of Perl's behaviour, since I usually don't use Perl one-liners.
printf "\ntest"
will insert a newline before "test", even though"\n"
should have been translated to"n"
by the shell as it is whithin double quotes... (so the expected result should be, for "\ntest", "ntest". We should get the habit to write:printf "\\ntest"
orprintf '\ntest'
, but somehow I see a lot of script relying on the oddity instead. – Olivier Dulac May 28 '18 at 13:35dash
manual page: The backslash inside double quotes is historically weird, and serves to quote only the following characters:$
\``
"\