1

Reading the section regular expressions in EmacsWiki appears this:

You can use a tool to construct regexps. For example, you can use ‘rx’ like this:

(rx (or (and "\*" (*? anything) "*/") (and "//" (*? anything) eol)))

To produce this regexp (which matches C-style multiline and single line comments):

\\*\\(?:.\\|\n\\)*?\\*/\\|//\\(?:.\\|\n\\)*?$

I get the or will match either C-style multiline or C/C++ single line comments.
But I do not get the and parts.

  1. I am not sure how anything works and am assuming that will match any char.
    It might be something equivalent to .*. Is it?

  2. If I got it right, *? is the non-greedy variant of * operation.
    Which really confuses me... The smallest possible match for anything?

  3. Why the C-style multi-line opening is shown as "\*"? I suspect it is a typo on EmacsWiki and it should be "/*".

nephewtom
  • 2,219
  • 17
  • 29
  • 1
    See also the documentation for `rx` under [`(info "(elisp) Rx Notation")`](https://www.gnu.org/software/emacs/manual//html_node/elisp/Rx-Notation.html). – Basil Oct 19 '20 at 11:39
  • `.` matches any char *except newline*. So `.*` matches zero or more non-newline chars. `.\\|\n` matches any char (including newline). – Drew Oct 19 '20 at 16:34

1 Answers1

1

anything matches any character, while regexp . matches any character except a newline, the rx representation is not-newline.

*? matches zero-or-more, non-greedy. Let's extract comment from C code:

/* comment */ char *s = "*/";
;; * is zero-or-more, greedy
(let ((string "/* comment */ char *s = \"*/\""))
  (when (string-match (rx "/*" (* anything) "*/") string)
    (match-string 0 string)))
;; => "/* comment */ char *s = \"*/"

;; *? is zero-or-more, non-greedy
(let ((string "/* comment */ char *s = \"*/\""))
  (when (string-match (rx "/*" (*? anything) "*/") string)
    (match-string 0 string)))
;; => "/* comment */"

Check the doc via C-h f rx or (info "(elisp) Rx Notation").

Muihlinn
  • 2,576
  • 1
  • 14
  • 22
xuchunyang
  • 14,302
  • 1
  • 18
  • 39