0

The code from here accesses groups of a regular expression match in a buffer in interactive mode:

(let ((re (concat "\\([ \t]*" org-clock-string " *\\)"
                        "\\([[<][^]>]+[]>]\\)\\(-+\\)\\([[<][^]>]+[]>]\\)"
                        "\\(?:[ \t]*=>.*\\)?")))
        (when (looking-at re)
          (let ((indentation (match-string 1))
                (start (match-string 2))
                (to (match-string 3))
                (end (match-string 4))
                (use-start-as-default (equal end-as-default nil)))

But match-string n fails with in non-interactive mode, for example in the *scratch* buffer:

(string-match "\\([0-9]?[0-9]\\):\\([0-9]\\{2\\}\\)" "09:20")
0

(match-string 0)
#("save" 0 4 (fontified nil face font-lock-comment-face))

How can I access regular expression groups outside a buffer?

Update: after reading Strange behaviour of match-string/string-match adding the optional argument to match-string doesn't help and I get an error:

(string-match "\\([0-9]?[0-9]\\):\\([0-9]\\{2\\}\\)" "09:20")
0

(match-string 0 "09:20") ;; Debugger entered--Lisp error: (args-out-of-range "09:20" 9 15)

When I run it again, then the last expression returns "09:20", even after I restart Emacs ([this thread from the comments]. When I run the code in emacs -q -nw, then I get the error above.

miguelmorin
  • 1,751
  • 11
  • 33
  • 2
    Does this answer your question? [Strange behaviour of match-string/string-match](https://emacs.stackexchange.com/questions/34172/strange-behaviour-of-match-string-string-match) – Drew Dec 14 '19 at 16:52
  • @Drew No it did not, and I updated the question. – miguelmorin Dec 15 '19 at 22:43
  • 1
    I believe you are running into the problem that @Stefan identified in the comments to my answer. Can you try to reproduce with `(progn (string-match ...) (match-string ...))`? – NickD Dec 15 '19 at 23:19
  • 1
    "suggests that match data persists across sessions" -- this is absolutely not true. Moreover, you mustn't assume that the match data will persist for even the duration of a single command. What you *should* take in from that Q&A is everything the accepted answer tells you -- which is essentially that "If you evaluate the lines one by one, the match object will most certainly be mutated in that time" (i.e. what NickD's comment above this one -- and much of the text and comments in his earlier answer -- is pointing out). – phils Dec 16 '19 at 06:08
  • 2
    In short, the match data is used *a lot* by Emacs -- your code is *not* the only code using it. So if you want a value from it, you need to make sure you do so before anything else has an opportunity to set it. – phils Dec 16 '19 at 06:16
  • @phils Thanks, now I understand NickD's answer and the accepted answer from the [Q&A about match data](https://emacs.stackexchange.com/questions/40896/match-data-fails-to-consider-only-last-search-with-string-match-and-persists-acr) and updated my question. – miguelmorin Dec 16 '19 at 12:41
  • @NickD Yes, that works now. I dealt with your comment before dealing with the answer. Since the problem is both about the extra `STRING` argument and the `progn` wrapper to maintain state, I believe this is a different question. – miguelmorin Dec 16 '19 at 12:53
  • 1
    Also cross-referencing with https://emacs.stackexchange.com/a/18345 – phils Mar 30 '20 at 19:26

1 Answers1

2

The doc string of match-string says:

(match-string NUM &optional STRING)

This function does not change global state, including the match data.

Return string of text matched by last search. NUM specifies which parenthesized expression in the last regexp. Value is nil if NUMth pair didn’t match, or there were less than NUM pairs. Zero means the entire text matched by the whole regexp or whole string. STRING should be given if the last search was by ‘string-match’ on STRING. If STRING is nil, the current buffer should be the same buffer the search/match was performed in.

(emphasis added)

IOW, when you are doing match-string after doing string-match, you have to pass the string that you searched in as an argument to match-string. match-string only knows about beginning and ending indices: if you don't give it the string argument, it assumes that you did a search in the buffer and gets a substring out of the buffer (probably somewhere near the beginning). It does not know that you did a string-match unless you tell it by passing to it the string argument.

As @Stefan points out in the comments, it is important to take precautions not to trash the match data (trashing the match data is easy to do: basically, most emacs functions do not try to preserve the match data, so you need to save the various matches you are interested in as soon as possible after the match data are calculated and before any such functions are called - as Stefan points out, typing an expression into the *scratch* buffer and evaluating it, and then typing another expression and evaluating it is going to run a lot of emacs code between - and during - the code evaluations that might very well trash the match data produced in the first evaluation).

And, as in the accepted answer to match-data fails to consider only last search with string-match and persists across sessions, you should test the match before using match-data:

match-string is stateful and "can" persist on consecutive searches even if your next string-match search returns nil.

The following should work:

(when (string-match "\\([0-9]?[0-9]\\):\\([0-9]\\{2\\}\\)" "09:20") ;; <-- skip when string fails to match
    (setq group-zero (match-string 0 "09:20")
      group-one  (match-string 1 "09:20")
      group-two  (match-string 2 "09:20")))
"20"     ;;<--- the value of the "when" is the value that the last setq returned since the string matched and all the setq's wer executed.


;; now that they are saved, we can examine them at leisure
group-zero
"09.20"

group-one
"09"

group-two
"20"
NickD
  • 27,023
  • 3
  • 23
  • 42
  • 3
    If it work for you, you're just really lucky: there can be a lot of Elisp code run between two uses of `C-x C-e` (or equivalent), so there's no guarantee that the match data of one command is still available when you run the second command. – Stefan Dec 14 '19 at 17:24
  • That's right: I typed everything into the `*scratch*` buffer (in `lisp-interaction-mode`) and then just evaluated each expression by moving the cursor and doing `C-j`. AFAICS, that does not disturb the match data. – NickD Dec 14 '19 at 19:11
  • 1
    `C-j` in `lisp-interaction-mode` is like `C-x C-e` in the sense that it does not try to preserve the match data between invocations, so you just got lucky. – Stefan Dec 14 '19 at 19:16
  • Well, there must be a way to guarantee that the match data does not get trashed (otherwise, if it's just luck, the whole interface is useless). Would wrapping it in a `progn` be enough for that? – NickD Dec 14 '19 at 21:15
  • 1
    To be clear, *some* Emacs functions do preserve `match-data` -- the macro `(save-match-data &rest BODY)` exists specifically for this purpose -- but *in general* this isn't guaranteed, and so if you're using `match-data` it's up to you to make sure it can't be clobbered before you've used it. – phils Dec 14 '19 at 22:18
  • 2
    n.b. This tends to be easy to ensure -- unless you allow Emacs to enter a 'waiting' state (in which case it may start dealing with other things), you know that nothing has been done to `match-data` outside of the code that you can see. The `progn` wrapper in this instance serves the purpose perfectly. – phils Dec 15 '19 at 07:00
  • 1
    https://emacs.stackexchange.com/q/40896 is another Q&A on that topic. – phils Dec 15 '19 at 07:00