Is there any principal difference between "A-Z" and upper?

Question

I just want to make sure I understood this correctly.

(rx (one-or-more (any upper lower)))

is equal to

(rx (one-or-more (any "A-Z" "a-z")))

Correct?

`[A-Z]` matches only an *ASCII* uppercase letter, that is, a letter from `A` through `Z`. There are other, non-ASCII uppercase letters (e.g., in languages other than English). — Drew, Aug 28 '17 at 16:52

zck · Accepted Answer · 2017-08-28T18:02:48.573

The macro rx returns regexp strings that can be passed to other Emacs functions.

ELISP> (rx (one-or-more (any upper lower)))
"[[:lower:][:upper:]]+"
ELISP> (rx (one-or-more (any "A-Z" "a-z")))
"[A-Za-z]+"

That doesn't answer your question directly; it pushes the question to "are these two regexes identical?" So, let's look for an uppercase or lowercase letter that is not between A and Z, or a and z. Let's try á.

ELISP> (string-match-p (rx (one-or-more (any upper lower))) "á")
0 (#o0, #x0, ?\C-@)
ELISP> (string-match-p (rx (one-or-more (any "A-Z" "a-z"))) "á")
nil

So the regexes are not identical. Presumably you want to use (rx (one-or-more (any upper lower))) most of the time; it not only includes characters most people think of as letters, but I'd argue is also more readable.

Is there any principal difference between "A-Z" and upper?

1 Answers1