411

Take the following script:

#!/bin/sh
sed 's/(127\.0\.1\.1)\s/\1/' [some file]

If I try to run this in sh (dash here), it'll fail because of the parentheses, which need to be escaped. But I don't need to escape the backslashes themselves (between the octets, or in the \s or \1). What's the rule here? What about when I need to use {...} or [...]? Is there a list of what I do and don't need to escape?

detly
  • 5,160

4 Answers4

466

There are two levels of interpretation here: the shell, and sed.

In the shell, everything between single quotes is interpreted literally, except for single quotes themselves. You can effectively have a single quote between single quotes by writing '\'' (close single quote, one literal single quote, open single quote).

Sed uses basic regular expressions. In a BRE, in order to have them treated literally, the characters $.*[\^ need to be quoted by preceding them by a backslash, except inside character sets ([…]). Letters, digits and (){}+?| must not be quoted (you can get away with quoting some of these in some implementations). The sequences \(, \), \n, and in some implementations \{, \}, \+, \?, \| and other backslash+alphanumerics have special meanings. You can get away with not quoting $^ in some positions in some implementations.

Furthermore, you need a backslash before / if it is to appear in the regex outside of bracket expressions. You can choose an alternative character as the delimiter by writing, e.g., s~/dir~/replacement~ or \~/dir~p; you'll need a backslash before the delimiter if you want to include it in the BRE. If you choose a character that has a special meaning in a BRE and you want to include it literally, you'll need three backslashes; I do not recommend this, as it may behave differently in some implementations.

In a nutshell, for sed 's/…/…/':

  • Write the regex between single quotes.
  • Use '\'' to end up with a single quote in the regex.
  • Put a backslash before $.*/[\]^ and only those characters (but not inside bracket expressions). (Technically you shouldn't put a backslash before ] but I don't know of an implementation that treats ] and \] differently outside of bracket expressions.)
  • Inside a bracket expression, for - to be treated literally, make sure it is first or last ([abc-] or [-abc], not [a-bc]).
  • Inside a bracket expression, for ^ to be treated literally, make sure it is not first (use [abc^], not [^abc]).
  • To include ] in the list of characters matched by a bracket expression, make it the first character (or first after ^ for a negated set): []abc] or [^]abc] (not [abc]] nor [abc\]]).

In the replacement text:

  • & and \ need to be quoted by preceding them by a backslash, as do the delimiter (usually /) and newlines.
  • \ followed by a digit has a special meaning. \ followed by a letter has a special meaning (special characters) in some implementations, and \ followed by some other character means \c or c depending on the implementation.
  • With single quotes around the argument (sed 's/…/…/'), use '\'' to put a single quote in the replacement text.

If the regex or replacement text comes from a shell variable, remember that

  • The regex is a BRE, not a literal string.
  • In the regex, a newline needs to be expressed as \n (which will never match unless you have other sed code adding newline characters to the pattern space). But note that it won't work inside bracket expressions with some sed implementations.
  • In the replacement text, &, \ and newlines need to be quoted.
  • The delimiter needs to be quoted (but not inside bracket expressions).
  • Use double quotes for interpolation: sed -e "s/$BRE/$REPL/".
  • 1
    Escaping the actual wildcard character () you can use double backslash (\\*). Example: `echo "NEW*" | sed /\*\*\*NEW\*\*\*/s/^/#/` – Melroy van den Berg Mar 20 '19 at 16:44
  • 1
    "Use ''' to end up with a single quote in the regex." didn’t work for me on macOS Catalina. I had to switch to using double quotes and putting the single-quotes inside. Tried everything between 0-2 backslashes. – Florian Wendelborn Dec 13 '19 at 19:43
  • 3
    I had to escape + as well, for it to have the regexy meaning. – hoijui Apr 03 '21 at 08:43
  • @danger89: You discovered the difference between single quotes and double quotes (or no quotes), for example: grep -rl access\\.log . = grep -rl "access\\.log" . = grep -rl 'access\.log' .. @hoijui: Same here with Ubuntu 18.04.5. – uav Apr 08 '21 at 17:04
  • The only thing I can add is that your shell (most likely bash in your csse) is also required to escape characters. On top of what the sed command expect to be escaped. Making it a bit confusing. – Melroy van den Berg Apr 09 '21 at 18:33
  • 1
    wow could it be any more easier than that! – ononononon May 03 '22 at 12:15
  • In my experience, sed treats "*" literally by default, i found that i could replace those in a file without using the backslash. – thinksinbinary Dec 04 '22 at 00:55
  • @thinksinbinary That's only true in rare circumstances: sed (and many other regex engines) treats special characters literally if they appear in an invalid position. So for example s/*/a/ replaces * with a, because there's no previous character for * to be applied to. But s/x*y/a/ replaces any number of x followed by a y, because * applies to x. This is a bit of weirdness that you can live without: you can (and should, for clarity) write s/\*/a/ if you want to match a literal *. – Gilles 'SO- stop being evil' Dec 04 '22 at 09:17
63

The problem you're experiencing isn't due to shell interpolating and escapes - it's because you're attempting to use extended regular expression syntax without passing sed the -r or --regexp-extended option.

Change your sed line from

sed 's/(127\.0\.1\.1)\s/\1/' [some file]

to

sed -r 's/(127\.0\.1\.1)\s/\1/' [some file]

and it will work as I believe you intend.

By default sed uses uses basic regular expressions (think grep style), which would require the following syntax:

sed 's/\(127\.0\.1\.1\)[ \t]/\1/' [some file]
R Perrin
  • 3,049
  • 20
  • 11
24

Unless you want to interpolate a shell variable into the sed expression, use single quotes for the whole expression because they cause everything between them to be interpreted as-is, including backslashes.

So if you want sed to see s/\(127\.0\.1\.1\)\s/\1/ put single quotes around it and the shell won't touch the parentheses or backslashes in it. If you need to interpolate a shell variable, put only that part in double quotes. E.g.

sed 's/\(127\.0\.1\.1\)/'"$ip"'/'

This will save you the trouble of remembering which shell metacharacters are not escaped by double quotes.

Kyle Jones
  • 15,015
  • I want sed to see s/(127\.0\.1\.1)/..., but putting that in a shell script as-is doesn't work. What you're saying about the shell not touching the parentheses seems wrong. I've edited my question to elaborate. – detly Feb 28 '12 at 06:14
  • 4
    The shell isn't touching the parentheses. You need the backslases because sed needs to see them. sed 's/(127\.0\.1\.1)/IP \1/' fails because sed needs to see \( and \) for group syntax, not ( and ). – Kyle Jones Feb 28 '12 at 06:31
  • facepalm It's not in the man page, but it IS in some online manual I found. Is this normal for regex, because I've never had to use it in regex libraries (in, eg. Python)? – detly Feb 28 '12 at 06:33
  • ...and apparently + needs escaping too. That's... interesting. – detly Feb 28 '12 at 06:41
  • 3
    For traditional Unix commands, there are basic regular expressions and extended regular expressions. Details. sed uses basic regular expressions, so the backslashes are needed for group syntax. Perl and Python went beyond even extended regular expressions. While I was poking around I found an extremely informative chart that illustrates what a confusing bramble we conjure up when we glibly say "regular expression." – Kyle Jones Feb 28 '12 at 07:07
  • Oh, this takes me back. I think this guide by Bruce Barnett has the RE used by sed. But what about the +? That's not in there, but echo 'AAA' | sed 's/A+/*/' won't work without escaping it. – detly Feb 28 '12 at 07:12
  • Wow, interesting. + isn't special for sed under Mac OS X. But it is special for sed under Linux. My head really hurts now. – Kyle Jones Feb 28 '12 at 07:18
  • 1
    I would also add that the only character that cannot be used inside single quotes is a single quote. – enzotib Feb 28 '12 at 09:08
4

I think it's worth mentioning that, while sed is based on the the POSIX standard, which specifies support only for basic regular expression (BRE), two different versions of the sed command actually exist - BSD(Mac OS) and GNU(Linux distros). Each version implements similar, as well as unique extensions to the POSIX standard, and can affect the functionality of sed across different platforms. As a result, proper syntax of the sed command, functioning as expected on one system, might actually translate to completely different results on another. This can lead to unexpected behavior with regards to the usage of escaped and special characters.

These extensions to the POSIX standard tend to be more prevalent on the GNU version of sed, often times providing the convenience of less strict formatting, especially in comparison to the BSD version. However, while GNU sed does allow for the functionality of some special characters, they are still not actually POSIX-compliant. Additionally, the only real difference between basic and extended regular expression(ERE), within GNU sed, is the behavior of the following special characters:

‘?’, ‘+’, parentheses, braces (‘{}’), and ‘|’

While this may be the case, some special characters have limited or no support at all on BSD sed, such as ‘|’, '?', and '+', as it more closely adheres to the POSIX syntax standards. The inclusion of those characters, in a fashion similar to that of GNU sed, will often result in issues with portability and functionality of scripts utilizing sed. It's also worth noting, POSIX BRE syntax does not define a meaning for some escape sequences, most notably: \|, +, \?, `, \', \<, >, \b, \B, \w, and \W,.

For those running the BSD/Mac OS version of sed, emulating behavior of some special characters can be a bit tricky, but it can be done in most cases. For example, + could be emulated in a POSIX-compliant fashion like this: {1,} and \? would look like this: {0,1} Control character sequences, however, are typically not supported. If at all possible, it's certainly easiest to utilize GNU sed, but if you need functionality on both platforms, remember to use POSIX features only, to ensure portability. If you're a Mac user and would like to take advantage of GNU sed as opposed to BSD sed, you might try installing Homebrew, and downloading GNU sed via command line with: $brew install gnu-sed.

To wrap things up, differences in version can really dictate what the proper syntax might look like, or what characters are necessary to escape. I hope this provides some additional context for the initial question as well as the accepted answer, and helps others consider how they should proceed, based on the end goal of their script and command usage.