7

I just want to make sure I understood this correctly.

(rx (one-or-more (any upper lower)))

is equal to

(rx (one-or-more (any "A-Z" "a-z")))

Correct?

serghei
  • 272
  • 3
  • 15
  • 6
    `[A-Z]` matches only an *ASCII* uppercase letter, that is, a letter from `A` through `Z`. There are other, non-ASCII uppercase letters (e.g., in languages other than English). – Drew Aug 28 '17 at 16:52

1 Answers1

13

The macro rx returns regexp strings that can be passed to other Emacs functions.

ELISP> (rx (one-or-more (any upper lower)))
"[[:lower:][:upper:]]+"
ELISP> (rx (one-or-more (any "A-Z" "a-z")))
"[A-Za-z]+"

That doesn't answer your question directly; it pushes the question to "are these two regexes identical?" So, let's look for an uppercase or lowercase letter that is not between A and Z, or a and z. Let's try á.

ELISP> (string-match-p (rx (one-or-more (any upper lower))) "á")
0 (#o0, #x0, ?\C-@)
ELISP> (string-match-p (rx (one-or-more (any "A-Z" "a-z"))) "á")
nil

So the regexes are not identical. Presumably you want to use (rx (one-or-more (any upper lower))) most of the time; it not only includes characters most people think of as letters, but I'd argue is also more readable.

zck
  • 8,984
  • 2
  • 31
  • 65