0

I want to match every string that cointains a year and a month name e.g. like that:

2022 Jul or 2022 Jul (and more spaces than 2) or even 2022Jul.

The regex I have so far is:

"\\([0-9]\\{4\\}\\)"

And this will grab the year.

But now I don't know how to target the existence of spaces.

In JavaScript it would be something like:

[0-9]{4}\s*[A-Za-z]{3}

But I can't make it work in Emacs for anything ;(.

What strikes me the most is that there is ZERO!!! mention about spaces or whitespaces or breaks in the offical Emacs docs. Check for yourself: https://www.gnu.org/software/emacs/manual/html_node/elisp/Regexp-Special.html

And searching on Google doesn't bring anything useful.

Can you guys help how to target spaces in my regex example?

Can anybody help? I don't get the Regex in Emacs at all.

Glorfindel
  • 234
  • 1
  • 5
  • 13
fegax
  • 35
  • 5
  • 3
    Read the next page in the docs: https://www.gnu.org/software/emacs/manual/html_node/elisp/Char-Classes.html and you'll find out about `[:blank:]` and `[:space:]`, and the page after that https://www.gnu.org/software/emacs/manual/html_node/elisp/Regexp-Backslash.html for syntax classes, e.g., `\s-` – Tyler Jul 18 '22 at 18:19
  • 1
    Please clarify your question/headline. Are you asking "Where can I find documentation on regex syntax classes in elisp?" or "Why is this elisp regex not matching whitespaces?" – Malle Yeno Jul 18 '22 at 18:47
  • Those things cannot be separated. The problem is that the documentation is scattered and inconsistent. It's one of the worst if not the worst documentation to any programming langauage/software I have ever seen. Also, googling Emacs problem is very discouraging. Also, the help, is often very weird. Hard to explain. It's very elitist and belittling. Not a friendly environment at all ;( – fegax Jul 18 '22 at 19:15
  • 1
    The problem is more that the documentation is huge, primarily because it is very detailed. It makes for a somewhat forbidding entry, but once you've learnt how to use the Help system, it's about the *best* I've seen. See the [answers to this question](https://emacs.stackexchange.com/questions/72447) for some techniques - and practice, practice, practice: it does get easier with practice. Googling can be helpful sometimes but it does not compare with Asking Emacs! – NickD Jul 18 '22 at 19:26
  • If you can point to an example of belittling or elitist language anywhere in the docs, it should be reported as a documentation bug. The devs put a lot of effort into maintaining and updating the docs. – Tyler Jul 18 '22 at 20:16
  • Please do not vandalize your posts. If you believe your question is not useful or is no longer useful, it should be deleted instead of editing out all of the data that actually makes it a question. By posting on the Stack Exchange network, you've granted a [non-revocable right for SE to distribute that content](/legal/terms-of-service/public#licensing) under the CC BY-SA 4.0 license. By SE policy, any vandalism will be reverted. If you want to know more about deleting a post, consider taking a look at: [How does deleting work](//meta.stackexchange.com/q/5221/295232)? – Glorfindel Jul 20 '22 at 06:59

2 Answers2

2

Emacs uses \scode to target backslash constructs. See more in the documentation here: https://www.gnu.org/software/emacs/manual/html_node/elisp/Regexp-Backslash.html

In particular, the part in the description about whitespace may help you get the selection you are after:

‘\scode’ ¶

    matches any character whose syntax is code. Here code is a character that represents a syntax code: thus, ‘w’ for word constituent, ‘-’ for whitespace, ‘(’ for open parenthesis, etc. To represent whitespace syntax, use either ‘-’ or a space character. See Table of Syntax Classes, for a list of syntax codes and the characters that stand for them.

Try using \s- to select the whitespace here.

More information on Syntax Classes (as referenced in docstring above) available here: https://www.gnu.org/software/emacs/manual/html_node/elisp/Syntax-Class-Table.html

In particular, this could be useful for understanding whitespace selection:

36.2.1 Table of Syntax Classes

Here is a table of syntax classes, the characters that designate them, their meanings, and examples of their use.

Whitespace characters: ‘ ’ or ‘-’

    Characters that separate symbols and words from each other. Typically, whitespace characters have no other syntactic significance, and multiple whitespace characters are syntactically equivalent to a single one. Space, tab, and formfeed are classified as whitespace in almost all major modes.

    This syntax class can be designated by either ‘ ’ or ‘-’. Both designators are equivalent.

Malle Yeno
  • 373
  • 1
  • 3
  • 15
  • I have tried `"\\([0-9]\\{4\\}\scode\\)"` but it isn't working. – fegax Jul 18 '22 at 18:28
  • @fegax Just so I understand, did you put `\scode` into your regex, or did you put *the scode* in, as in `\-`? – Malle Yeno Jul 18 '22 at 18:31
  • What I wrote is exactly in my code. Character by character. Here is a screenshot: https://i.imgur.com/wZK8LdR.png – fegax Jul 18 '22 at 18:36
  • @fegax Sorry, important clarification (and one correction on my part): `\scode` shouldn't be entered literally. You need to replace the `code` part with the syntax class you are targetting. So you want to use `\s-` to target whitespace. (There's a difference between s and S since S excludes. So this part must be included. I missed that in my answer and comment, so I edited what I could.) – Malle Yeno Jul 18 '22 at 18:41
2

In addition to \scode, there are also the character classes [:space:] and [:blank:]. See chapter 35.3.1.2 Character Classes for the details. Note especially the first paragraph, where it mentions that character classes go inside square brackets, so you end up with double square brackets when using these.

enter image description here.

You can also just match the space character directly; * will match zero or more spaces (but not other types of whitespace, which you might want to allow)

You may also find M-x re-builder useful. Put some sample text you want to match (and some that you don’t want your regex to match!) into a buffer, then call re-builder. Type in your regex and it will show you what parts of the text in the buffer match. Very handy!

db48x
  • 15,741
  • 1
  • 19
  • 23
  • I have tried `"\\([0-9]\\{4\\}\scode\\)"` and also `"\\([0-9]\\{4\\}[:space:]\\)"` and `"\\([0-9]\\{4\\}[:blank:]\\)"` but those aren't working. Only `"\\([0-9]\\{4\\} +\\)"` is working. Any idea why? By the way, wouldn't * instead of + be better for my example? – fegax Jul 18 '22 at 18:31
  • Only if you want to match zero or more spaces: I don't think you want that. Or maybe you do... – NickD Jul 18 '22 at 18:46
  • Read my question. I meantioned it there. It's the second sentence. – fegax Jul 18 '22 at 19:02
  • That's why I edited my comment. – NickD Jul 18 '22 at 19:03