7

I frequently need to split strings, while keeping the separator. Researching the elisp manual, I am not finding a way to split a string according to a separator, without consuming the separator itself.

For example, split-string splits a given string into substrings based on the regular expression separators, but the substrings lose the separator:

(split-string "* one * two * three" "\*")

returns:

("" " one " " two " " three")

I could use s-slice-at, from @magnars s.el

(s-slice-at "\*" "* one * two * three")

("* one " "* two " "* three")

I could also try to rewrite split-string myself. Still, I was wondering if there is a built-in way to do it as part of Emacs core?

If not, could somebody please point to a good way to do that properly?

gsl
  • 1,742
  • 17
  • 34
  • 2
    So you just want `s-slice-at` with a different name? – abo-abo Dec 25 '14 at 17:02
  • No, I am happy with that. I was wondering if one must add a library to his toolchain just for that, or perhaps elisp has an idiomatic way to do that already (beside copying the relevant code form `s.el`)? – gsl Dec 25 '14 at 17:05
  • What's wrong with copying some code, or requiring some library? That's how all software is built. – abo-abo Dec 25 '14 at 17:07
  • Nothing wrong of course. Since I am an absolute beginner, I was wondering if there was indeed already a way to do it without copying code or requiring an extra library, as I would like to learn the basic idiomatic way first. – gsl Dec 25 '14 at 17:09
  • Alright, if you want to learn, rewrite `split-string` to do what you want. The code of `s-slice-at` isn't great anyway, since it uses recursion. – abo-abo Dec 25 '14 at 17:13
  • Thank you, I shall try to do that, although I would still be interested to see a proper, idiomatic way, especially if you say (and I definitely trust you, @abo-abo. @lispm on reddit was saying something of that sort in regards to that package ) that the `s.el` way is not that good after all. – gsl Dec 25 '14 at 17:15

2 Answers2

5

Here's some code for you. This is a slightly modified split-string. I've removed trim option for simplicity and added keep-sep option. The diff is basically 2 lines, so you could say that this code is idiomatic:

(defun split-string (string &optional separators omit-nulls keep-sep)
  "Split STRING into substrings bounded by matches for SEPARATORS."
  (let* ((keep-nulls (not (if separators omit-nulls t)))
         (rexp (or separators split-string-default-separators))
         (start 0)
         this-start this-end
         notfirst
         (list nil)
         (push-one
          (lambda ()
            (when (or keep-nulls (< this-start this-end))
              (let ((this (substring string this-start this-end)))
                (when (or keep-nulls (> (length this) 0))
                  (push this list)))))))
    (while (and (string-match
                 rexp string
                 (if (and notfirst
                          (= start (match-beginning 0))
                          (< start (length string)))
                     (1+ start) start))
                (< start (length string)))
      (setq notfirst t)
      (setq this-start start this-end (match-beginning 0)
            start (match-end 0))
      (funcall push-one)
      (when keep-sep
        (push (match-string 0 string) list)))
    (setq this-start start this-end (length string))
    (funcall push-one)
    (nreverse list)))

(split-string "* one * two * three" "\*" t t)
;; -> ("*" " one " "*" " two " "*" " three")
abo-abo
  • 13,943
  • 1
  • 29
  • 43
4

Perhaps a shorter example, using cl library:

(defun chop (string separator)
  (cl-loop with seplen = (length separator)
           with len = (length string)
           with start = 0
           with next = seplen
           for end = (or (cl-search separator string :start2 next) len)
           for chunk = (substring string start end)
           collect chunk
           while (< end len)
           do (setf start end next (+ seplen end))))

(chop "* one * two * three" "*")
("* one " "* two " "* three")

But delimiters are strings, not regular expressions.

wvxvw
  • 11,222
  • 2
  • 30
  • 55
  • Thank you, I like this version, since I could use it when programming in Common Lisp as well, but I would need to preserve the separator, not to discard it. The final result should be `("* one " "* two " "* three")` – gsl Dec 26 '14 at 20:06
  • @gsl yes, sorry, I somehow completely forgot about that :) – wvxvw Dec 26 '14 at 20:34
  • I know that some people do not like using the `cl` library, but I confess I am partial to it. I like using code where I can leverage the Common Lisp constructs. I know this may sound like anathema to some (please, take it as a trivial comment by an absolute beginner like myself), but personally, I wish and hope one day Elisp will slowly merge into Common Lisp. Having said that, I am not surprised to see that this solution is, to my beginner, naive eye, simpler and more readable that the solution based on pure elisp idiom (of course the other solution has a few more options). Thank you so much. – gsl Dec 27 '14 at 08:39