sed portability: extended regex vs. backslash

Question

We can write the next command in two ways:

# using extended regex
$ echo foobar | sed -E 's/(foo)(bar)/\2\1/'
barfoo

And:

# using backslashes
$ echo foobar | sed 's/\(foo\)\(bar\)/\2\1/'
barfoo

Using backslashes means that the command is more portable than the extended regex?

Note that just by prefixing with a backslash doesn't make it portable. E.g., + is ERE but \+ isn't Posixly. These are GNU sed only constructs. — guest_7, Feb 07 '21 at 17:18

score 5 · Accepted Answer · edited Feb 08 '21 at 10:45

Yes

The current POSIX standard of sed does not specify the -E flag, which enables extended regex (ERE). This alone is enough to conclude that the basic regex (BRE) form 's/\(foo\)\(bar\)/\2\1/' is the most portable.

However, even if -E were included sed's standard—and it will be—, the Regular Expressions document does not define back-references in EREs, so the BRE \(...\) == ERE (...) association is itself a GNU extension and not guaranteed to be supported by all programs. POSIX Grep, for example, includes the -E flag, but while each one of

grep 'ee*'
grep -E 'e+'
grep '\(.\)\1'

is compliant,

grep -E '(.)\1'

is not.

Likewise, there are reports that concretely illustrate that BSD does not follow the extension:

[In FreeBSD] sed -E '/(.)\1/d' removes lines that have a 1 after some other character.

whereas GNU sed would treat that as an back-reference and remove lines containing two equal and adjacent characters.

See also discussion at https://www.mail-archive.com/austin-group-l@opengroup.org/msg00929.html — Stéphane Chazelas, Feb 08 '21 at 11:36

pLumo · Answer 2 · 2021-02-07T16:01:03.150

2

sed -E means that it will use extended regex (ERE), without that flag it uses basic regex (BRE).

Not all sed versions can deal with extended regex, so yes it is more portable, but not because you use a backslash. That is just normal BRE.

See BRE vs ERE

edited Feb 07 '21 at 16:01

answered Feb 07 '21 at 15:57

pLumo

22,565

Thank you for your answer, I did just self answered second before yours with the same link :) – schrodingerscatcuriosity Feb 07 '21 at 15:59

score 1 · Answer 3 · answered Feb 07 '21 at 15:57

The GNU manual gives the answer:

5.2 Basic (BRE) and extended (ERE) regular expression

Basic and extended regular expressions are two variations on the syntax of the specified pattern. Basic Regular Expression (BRE) syntax is the default in sed (and similarly in grep). Use the POSIX-specified -E option (-r, --regexp-extended) to enable Extended Regular Expression (ERE) syntax.

In GNU sed, the only difference between basic and extended regular expressions is in the behavior of a few special characters: ‘?’, ‘+’, parentheses, braces (‘{}’), and ‘|’.

With basic (BRE) syntax, these characters do not have special meaning unless prefixed with a backslash (‘\’); While with extended (ERE) syntax it is reversed: these characters are special unless they are prefixed with backslash (‘\’).

The bold part of the quote is a GNU extension, the manual does not seem to mention it, though. Also the statement about -E is not totally accurate. The current issue of the POSIX standard still does not include the -E option, although it has been accepted for the next one. An analogous situation is Is awk's nextfile specified in POSIX? — Quasímodo, Feb 08 '21 at 10:43

sed portability: extended regex vs. backslash

3 Answers3

Yes