1

We usually use \| to depict OR in Regex. Using a\|b as input to Regexp I-search and Occur both gave the desired result. But re-search-forward seems unable to recognize this notation.

abcdcba

(re-search-forward "a\|b")

;; Search failed: "a|b"
Drew
  • 75,699
  • 9
  • 109
  • 225
Sati
  • 775
  • 6
  • 21
  • I'm surprised that I can't find a duplicate for this on E.S. https://emacs.stackexchange.com/q/5568/454 and https://emacs.stackexchange.com/q/45382/454 might both be of use, however. – phils Feb 18 '20 at 04:35
  • https://stackoverflow.com/q/538842/324105 on S.O. is probably the best duplicate question, but https://stackoverflow.com/a/10864091/324105 is also fairly comprehensive on the subject of backslashes in regexps. – phils Feb 18 '20 at 04:49
  • `"a\|b" ;; => "a|b"`, in other word, 3 characters, however, the regexp `a\|b` is, `a`, ``\``, `|` and `b`, 4 characters. – xuchunyang Feb 18 '20 at 04:56
  • @phils: The underlying question isn't really about regexps or regexp syntax, I think. It would be good to have a separate (Community) question whose answer is that backslashes in Lisp strings need to be doubled. Then this question would really be a dup of that one. As it stands, if this question becomes the one we point to for questions that are answered by doubling \ in Lisp strings, then it won't be as discoverable, since it seems to be about regexps. – Drew Feb 18 '20 at 19:47
  • @phils: Here's another one - near-dup: https://emacs.stackexchange.com/q/34775/105. And another: https://emacs.stackexchange.com/a/19646/105. And another: https://emacs.stackexchange.com/q/45382/105. – Drew Feb 18 '20 at 19:51
  • @Drew, I agree that a Q&A about string escaping generally would be valuable; however I believe the compounded confusion about backslashes in regexp strings is so very common that it absolutely warrants a Q&A which discusses that specifically, because I suspect 95% of the time people will be searching for "regexp" and not "string". – phils Feb 18 '20 at 21:22
  • @phils: I agree. It would be good to have two Community questions: (1) backslashes in Lisp strings, (2) regexp strings in Lisp (e.g. search). #2 would point to #1. – Drew Feb 18 '20 at 23:52

1 Answers1

3

Elisp regexps are represented as strings, which means backslashes are interesting, as they are not only special to regexps, but also when writing strings.

Emacs requires a literal \ character to be escaped in the double-quoted read syntax for strings and so, when the code is processed by the lisp reader, "\\" becomes a string object containing a single \ character; and hence that single backslash is what the regexp engine sees when it uses that string object.

So in your instance, the regexp a\|b is represented by "a\\|b" in the double-quoted read syntax for strings.

Conversely "a\|b" is the regexp a|b (because \| is not a special construct in the read syntax for strings, so all we have done there is needlessly escaped a | character; hence "a\|b" is no different to "a|b"), and a|b contains no regexp-special constructs, so it matches the three-character sequence a|b literally.

The elisp manual explains further:

`\' has two functions: it quotes the special characters (including
`\'), and it introduces additional special constructs.

Because `\' quotes special characters, `\$' is a regular
expression that matches only `$', and `\[' is a regular expression
that matches only `[', and so on.

Note that `\' also has special meaning in the read syntax of Lisp
strings (*note String Type::), and must be quoted with `\'.  For
example, the regular expression that matches the `\' character is
`\\'.  To write a Lisp string that contains the characters `\\',
Lisp syntax requires you to quote each `\' with another `\'.
Therefore, the read syntax for a regular expression matching `\'
is `"\\\\"'.

-- C-hig (elisp)Regexp Special

It is also worth noting that \ is not special within a character alternative (this is also true for most other regexp-special characters), and therefore [\] (aka "[\\]") matches a backslash.

As a `\' is not special inside a character alternative, it can never
remove the special meaning of `-' or `]'.  So you should not quote
these characters when they have no special meaning either.  This would
not clarify anything, since backslashes can legitimately precede these
characters where they _have_ special meaning, as in `[^\]' (`"[^\\]"'
for Lisp string syntax), which matches any single character except a
backslash.
phils
  • 48,657
  • 3
  • 76
  • 115