Batch mode, read NUL-delimited list of file names from stdin

Question

I am writing a batch Emacs script and I would like it to read a NUL-delimited list of file names to process from stdin. (NUL-delimited lists of file names are what you get from the Unix command find ... -print0 among other things.)

I understand from the manual that, in batch mode, read-from-minibuffer reads from Emacs' stdin, and that one can control where it stops reading by supplying a keymap as an argument. Based on this, it seems like

printf 'foo\000bar\000baz\nblurf\r\n\000' |
   emacs -no-site-file -batch -eval '
      (let ((kmap (make-sparse-keymap)))
        (define-key kmap "\C-@" '\''exit-minibuffer)
        (message "%S" (read-from-minibuffer "" nil kmap)))
   '

should print "foo", but it does not. It prints "foo^@bar^@baz" where the two occurrences of ^@ represent literal NUL bytes. I also tried

printf 'foo\000bar\000baz\nblurf\r\n\000' |
   emacs -no-site-file -batch -eval '
      (message "%S" (read-from-minibuffer "" nil (make-keymap)))
   '

with the same result (here I expected to get all of the input as a single string).

What am I doing wrong?

Did you try Gilles's answer? If so, did it work for you? – NickD Oct 29 '21 at 02:47 — NickD, Oct 29 '21 at 02:47

score 4 · Answer 1 · answered Oct 25 '21 at 18:50

As NickD analyzed, read-from-minibuffer reads one line at a time, and considers either CR or LF to end a line and doesn't distinguish between them. And I can't find another way to read from standard-input.

If your operating system has /dev/stdin (which in practice, nowadays, basically means not Windows), you can open that. The following snippet parses a list of null-terminated items from standard input.

printf 'éfoo\000bar\000baz\nblurf\r\n\000' |
emacs -no-site-file -batch -eval '
  (let ((parts (with-temp-buffer
                 (insert-file-literally "/dev/stdin")
                 (if (eobp)
                     nil
                   (split-string (buffer-substring-no-properties (point-min) (1- (point-max))) "\000")))))
    (print parts))'

The special case (eobp) is for the empty input: this way, an empty input results in an empty list, while any other input is assumed to end with a null byte which gets truncated.

NickD · Answer 2 · 2021-10-29T01:38:17.500

Assuming I'm reading the code correctly, the keymap argument is ineffective when -batch is used. What happens is that read-from-minibuffer gets called and it, in turn, calls read_minibuf (Line 1318 in my version of src/emacs/minibuf.c - this is probably somewhat out of date but not too far off). read_minibuf (defined starting on line 545 of the same file) does a bit of initialization and then checks its noninteractive flag (line 621): if it is true (as it is in this case since we are using -batch), then it calls read_minibuf_noninteractive and just returns the result of this call. But read_minibuf_noninteractive does not care a whit about the keymap: it does not take it as an argument, it does not use it, it completely ignores it. All it cares about is low-level stuff: it gets characters using getchar() and looks for EOF, \n and \r. If it gets one of these, it returns whatever it has accumulated so far.

E.g. if I run the following command (a slight modification of yours so that I can store all the lisp code in a file):

printf 'foo\000bar\000baz\n\rblurf\000' | emacs --batch -l /tmp/foo2.el

where foo2.el contains the following code:

(setq s (read-from-minibuffer ""))
(print s)
(setq s2 (read-from-minibuffer ""))
(print s2)
(setq s3 (read-from-minibuffer ""))
(print s3)
(setq s4 (read-from-minibuffer ""))
(print s4)

I get the following output:

"foo^@bar^@baz"

""

"blurf^@"

Debugger entered--Lisp error: (end-of-file "Error reading from stdin")
  read-from-minibuffer("")
  (setq s4 (read-from-minibuffer ""))
  eval-buffer(#<buffer  *load*> nil "/tmp/foo2.el" nil t)  ; Reading at buffer position 235
  load-with-code-conversion("/tmp/foo2.el" "/tmp/foo2.el" nil t)
  load("/tmp/foo2.el" nil t)
  command-line-1(("-l" "/tmp/foo2.el"))
  command-line()
  normal-top-level()

So the first read returned the string "foo^@bar^@baz" including the NUL bytes but stopping at \n. The second read returned an empty string (the string between \n and '\r`), the third read returned "blurf^@" including the NUL and stopping at EOF and the fourth read got an error because it tried to read past the EOF.

So I think the strategy that you have to follow is:

do not allow \n and \r if you expect to read everything in one read.
forget about handling the NULs using the keymap.
do one read to get the whole input stream as a single string (including all the NULs) and then parse the string, splitting it at the NULs.

Something like this (note that I am using the s library which is a third-party library available from MELPA):

(load-file "/path/to/s.elc")
(setq s (read-from-minibuffer ""))
(setq l (s-split "\000" s))
(print l)

Assuming that this is in a file /tmp/foo3.el, invoking it with a slight modification of your command line to avoid the troublesome \n and \r, but allowing spaces and tabs, gives this:

$ printf 'foo\000bar\000baz\t blurf barf\000' | emacs --batch -l /tmp/foo3.el

Loading /path/to/s.elc...

("foo" "bar" "baz    blurf barf" "")

giving you a list of strings, which can be used for further processing, by splitting the original string at the NULs.

If you have to have \n and/or \r in your input, you will not be able to get the whole input in one read: you will have to loop until EOF (probably by catching the error and ignoring it other than treating it as the end of the input) and you will not be able to tell the difference between \n and \r since they are not part of the returned string and they both cause the same behavior. As long as the difference is not important, you can concatenate all the strings you read into one string with newline separators and then split the resulting string on NULs as above.

Thanks for doing the source dive. Losing the difference between \r and \n is a serious problem; the whole point of NUL-delimited filename lists is to handle filenames that could contain *any* byte sequence, including \r and \n. Can you think of any alternative? — zwol, Oct 25 '21 at 12:58
Maybe escape them before passing them to emacs and unescape them after the NUL parsing and before whatever processing you do on them? — NickD, Oct 25 '21 at 14:19
A better alternative (IMNSHO, but I understand if you detect a bit of hubris): I would just do a scan and rename such files before processing. And then threaten users (or programs) who use `\n` or `\r` in their filenames with eternal fire if they persist in their evil ways :-) — NickD, Oct 25 '21 at 16:36

Batch mode, read NUL-delimited list of file names from stdin

2 Answers2