16

In Python I'd do the following to process a file line by line:

with open(infile) as f:
    for line in f:
        process(line)

Trying to look up how to do the same in elisp (with buffers instead of files), I found no obvious way.

(What I want to end up with is two ordered datastructures of lines, one with all the lines matching a regex, the other containing those that did not match.)

Drew
  • 75,699
  • 9
  • 109
  • 225
The Unfun Cat
  • 2,393
  • 16
  • 32

4 Answers4

29

There are various ways to do it. Kaushal's way can be made a good bit more efficient, with:

(goto-char (point-min))
(while (not (eobp))
  (let ((line (buffer-substring (point)
                                (progn (forward-line 1) (point)))))
    ...))

But in Emacs it is much more customary to work on the buffer rather than on strings. So rather than extract the string and then work on it, you'd just do:

(goto-char (point-min))
(while (not (eobp))
  ...
  (forward-line 1))

Also, if you want to operate on a region rather than on the whole buffer, and if your "operate" includes modifying the buffer, it's frequent to do it backwards (so that you don't get bitten by the fact that the "end" position of your region moves every time you modify the buffer):

(goto-char end)
(while (> (point) start)
  ...
  (forward-line -1))
Stefan
  • 26,154
  • 3
  • 46
  • 84
  • Thanks for those optimization tips! Always good to learn from you. – Kaushal Modi Jan 14 '16 at 15:29
  • About the last snippet, should it be this way: `(let ((start (point))) (goto-char (point-max)) (while (> (point) start) ... (forward-line -1)))`? – Kaushal Modi Jan 14 '16 at 16:21
  • No, the last snippet just assumes that `start` and `end` are existing variables which delimit the region on which we want to operate. – Stefan Jan 14 '16 at 19:41
  • @Stefan Doesn't `(progn (forward-line 1) (point))` move the point beyond the `eol` or `\n`into the next line? – vfclists Sep 14 '21 at 01:08
  • @vfclists: Indeed it does; that's what I use it for here. – Stefan Sep 15 '21 at 11:54
  • @Stefan Coming back roughly a year later, this code is adding `\n`s where I don't want them, so I'm back to Kaushal's code. How would you write it to avoide the `\n`s? – vfclists Aug 29 '22 at 15:29
7

I don't know of any idiomatic way but I came up with this:

(defun my/walk-line-by-line ()
  "Process each line in the buffer one by one."
  (interactive)
  (save-excursion
    (goto-char (point-min))
    (while (not (eobp))
      (let* ((lb (line-beginning-position))
             (le (line-end-position))
             (ln (buffer-substring-no-properties lb le)))
        (message ">> %s" ln) ; Replace this with any processing function you like
        (forward-line 1)))))
Kaushal Modi
  • 25,203
  • 3
  • 74
  • 179
5

I think the following is as idiomatic as it can get:

(dolist (line (split-string (buffer-string) "\n")) 
      ... process line here ...
)

EDIT: Here is another solution with loop in place of dolist, and which also classifies the lines according to whether or not they match your regular expression:

(loop for line in (split-string (buffer-string) "\n")
    if (string-match "your-regexp" line)
        collect line into matching
    else
        collect line into nonmatching
    finally return (cons matching nonmatching)
)

If you set a variable to the output of this function, say (setq x (loop ...)), then the desired list of matching lines will be found in (car x), with the list of nonmatching lines being (cdr x).

Muihlinn
  • 2,576
  • 1
  • 14
  • 22
Ruy
  • 787
  • 4
  • 11
  • 1
    While this works fine and is short to write, it's not "as idiomatic as it can get". Elisp's most idiomatic data structure is the growable buffer, which is where process output typically gets inserted. It is more efficient to process this output in-buffer, rather than allocating a big string with the buffer's contents, and then splitting this big string into lots of newly allocated smaller strings before finally processing them. The latter can produce a lot of garbage, so especially for large outputs the in-buffer approach tends to be more popular. – Basil Apr 16 '21 at 18:29
0

Just a note to Stefan's answer:

(goto-char (point-min))
(while (not (eobp))
  ...
  (forward-line 1))

This will only work if the last line of the file is empty. A simple fix for that would be:

(while (not (save-excursion (end-of-line) (eobp)))
Muihlinn
  • 2,576
  • 1
  • 14
  • 22