4

The general way this seems to be done using Elisp is to first read the entire file using something like insert-file-contents-literally or find-file-no-select, using split-string on a newline, and then removing the unwanted elements:

(defun first-n (list &optional n)
  "Return list containing the first N elements of LIST.

If N is nil, return the entire list."
  (let ((n (or n (length list))))
    (butlast list (- (length list) n))))

(defun read-lines (file &optional n delimiter)
  "Return the first N lines of FILE as separate elements of a list.

If N is nil, return the entire file."
  (let ((delimiter (or delimiter "\n")))
    (first-n
     (split-string
      (with-temp-buffer
        (insert-file-contents file)
        (buffer-substring-no-properties
          (point-min)
          (point-max)))
      delimiter
      t)
     n)))

This works but with the obvious drawback of reading the entire file.

What is a more efficient way to handle this?

Many of the file functions for Elisp are geared towards buffers than raw file processing. Looking at Common Lisp, it seems reading files is handled through streams. I couldn't find with-open-file in the cl-lib library. It doesn't seem like Elisp has any stream capabilities either.

Other than using the BEG and END arguments for insert-file-contents-literally, I can't think of a way to perform this task more efficiently.

Drew
  • 75,699
  • 9
  • 109
  • 225
Lorem Ipsum
  • 4,327
  • 2
  • 14
  • 35
  • 3
    "Other than using the BEG and END arguments..." -- FWIW, that's what I'd suggest. It should be *relatively* simple to make your search loop for newlines read the next chunk of the file as and when necessary. – phils Mar 25 '20 at 06:21
  • 1
    n.b. That approach is also how [`vlf.el`](https://elpa.gnu.org/packages/vlf.html) works. – phils Mar 25 '20 at 06:27
  • 1
    `(first-n (split-string ...))` is really inefficient, as you don't need to ALL lines of the file, and `split-string` also looks slow. And in Emacs Lisp, you're unlikely to process a large file such as 100MB, it's usually fine to insert the whole file at a time. – xuchunyang Mar 25 '20 at 11:20

1 Answers1

3

You're still going need to read the entire file, but you don't have to process every line, since you need just first N lines.

(defun your-read-lines (file n)
  "Return first N lines of FILE."
  (with-temp-buffer
    (insert-file-contents-literally file)
    (cl-loop repeat n
             unless (eobp)
             collect (prog1 (buffer-substring-no-properties
                             (line-beginning-position)
                             (line-end-position))
                       (forward-line 1)))))

It should be problematic if the file you're reading is larger than your available RAM, I guess it should be fine to tell Emacs to read a 1 GB file on a 4 GB RAM computer, that is, memory consumption is the concern, it should not be slow.

xuchunyang
  • 14,302
  • 1
  • 18
  • 39