24

On the Code Golf Stack Exchange site today, I found this answer in Clojure to the question "Get all links on a webpage".

(->> (slurp "http://www.stroustrup.com")
     (re-seq #"(?:http://)?www(?:[./#\+-]\w*)+"))

Without the fancy macro, it's just this:

(re-seq #"(?:http://)?www(?:[./#\+-]\w*)+" (slurp "http://www.stroustrup.com"))

This returns the list:

("http://www.morganstanley.com/" "http://www.cs.columbia.edu/" "http://www.cse.tamu.edu" ...)

Can I do something similar in Emacs Lisp?

Perhaps a function like (re-seq regexp (buffer-string)) that returns '(firstmatch secondmatch thirdmatch ...)?

nanny
  • 5,704
  • 18
  • 38
  • 2
    This is what `M-x occur` does, but I'd look inside for more low-level functions to do that. – wvxvw Jan 07 '15 at 21:35
  • @wvxvw That's a good point, I didn't even think about `occur`. I'll have to look through its source. – nanny Jan 07 '15 at 21:42
  • I looked inside, and oh woe, that code does too much and it's not easy to repurpose it, not at all. My next candidate would be `s.el`, but maybe there's more out there. Here: https://github.com/magnars/s.el#s-match-strings-all-regex-string how about this? – wvxvw Jan 07 '15 at 22:00

6 Answers6

21

Here is how you can do it based on strings, as requested.

(defun re-seq (regexp string)
  "Get a list of all regexp matches in a string"
  (save-match-data
    (let ((pos 0)
          matches)
      (while (string-match regexp string pos)
        (push (match-string 0 string) matches)
        (setq pos (match-end 0)))
      matches)))

; Sample URL
(setq urlreg "\\(?:http://\\)?www\\(?:[./#\+-]\\w*\\)+")
; Sample invocation
(re-seq urlreg (buffer-string))
Alan Shutko
  • 837
  • 7
  • 6
14

It's probably worth noting that invoking occur with the universal argument causes it to populate the *Occur* buffer with only matches — no file names, line numbers or header information. When combined with a capture group, this allows one to extract whatever pattern is desired.

For example, C-u M-x occur followed by \"\(.*\)\" will prompt the user for which capture group to collect (default \1), and then place the content of every quoted string into the *Occur* buffer.

Jack Rusher
  • 141
  • 1
  • 3
  • This is an excellent tip! You can visit the buffer programatically, like any other, and extract the matches as per. Very useful. Thanks. – hraban Jun 30 '21 at 22:06
10

I have an emacs lisp answer to that question posted: https://codegolf.stackexchange.com/a/44319/18848

Using the same (while (search) (print)) structure you could modify it into a function to push matches in a buffer to a list and return it like this:

(defun matches-in-buffer (regexp &optional buffer)
  "return a list of matches of REGEXP in BUFFER or the current buffer if not given."
  (let ((matches))
    (save-match-data
      (save-excursion
        (with-current-buffer (or buffer (current-buffer))
          (save-restriction
            (widen)
            (goto-char 1)
            (while (search-forward-regexp regexp nil t 1)
              (push (match-string 0) matches)))))
      matches)))
Jordon Biondo
  • 12,332
  • 2
  • 41
  • 62
  • 1
    Nice answer, note you may want to replace `match-string` with `match-string-no-properties` so the syntax highlight isn't extracted. You may want to pass a `regexp-group-index` to use so you can choose which text is stored. As well as reversing the order of searching (current list is last-to-first). See this answer which includes a modified version https://emacs.stackexchange.com/a/38752/2418 – ideasman42 Feb 12 '18 at 00:53
4

Using s.el this would've been shorter, but, unfortunately, it gives too many matches:

(defun all-urls-in-buffer ()
  (s-match-strings-all
   "\\(?:http://\\)?www\\(?:[./#+-]\\w*\\)+"
   (buffer-string)))

If this is ok (the regex for URLs isn't perfect anyway), this just might be shorter, and if not, then I don't think I could make it shorter than Alan Shutko's answer.

wvxvw
  • 11,222
  • 2
  • 30
  • 55
3

If I may be allowed a plug, take a look at my "m-buffer" library.

(m-buffer-match buffer "foo")

Returns a list of markers to matches to foo.

Phil Lord
  • 161
  • 3
3

Let me just mention why I think this is not implemented in the core. Simply for efficiency reasons: there's no need to copy, create lists, pass them around and garbage collect them. Instead, store the whole string as the buffer, and operate with integer match bounds. That's how occur works, for instance: it matches one string at a time and inserts the match into *occur*. It doesn't match all the strings at once, make them into the list, loop on the list to insert into *occur* and garbage collect the list and its strings.

Just like you wouldn't write (do (def x 1) (def x (+ 2 x))) in Clojure, you shouldn't by default try to have Elisp behave like a functional language. I'd love it if it was, but we have to make due with what we've got at the moment.

abo-abo
  • 13,943
  • 1
  • 29
  • 43