47

This question was prompted by questions about ls' -1 option and the recurring tendency of people to ask question and answers that includes processing the output of ls.

This reuse of the output ls seems understandable, e.g.: if you know how to sort a list of files with ls you might want to use the output in that way as input for something else.

If those Q&A don't include a reference to the file name list produced consisting of nicely behaving file names (no special characters like spaces and newlines), they are often commented upon by someone pointing out the danger of the command sequence not working when there are files with newlines, spaces etc.

find, sort and other utilities solve the problem of communicating "difficult" file names to e.g. xargs by using an option to separate the file names with the NUL character/byte which is not a valid character in file name (the only one in addition to /?) on Unix/Linux filesystems.

I looked to the man page for ls and the output for ls --help (which has more options listed) and could not find that ls (from coreutils) has an option to specify NUL separated output. It does have a -1 option which can be interpreted as "output file names separated by newline")

Q: Is there are technical or philosophical reason why ls does not have a --zero or -0 option that would "output file names separated by NUL"?

If you do something that only outputs the file names (and not use e.g. -l) that could make sense:

ls -rt -0 | xargs -r0 …

I could be missing something why this would not work, or is there an alternative for this example that I overlooked and that is not much more complicated and/or obscure.


Addendum:

Doing ls -lrt -0 probably does not make much sense, but in the same way that find . -ls -print0 does not, so that is not a reason to not provide a -0/-z/--zero option.

slm
  • 369,824
Timo
  • 6,332

3 Answers3

46

UPDATE (2014-02-02)

Thanks to our very own @Anthon's determination in following the lack of this feature up, we have a slightly more formal reason as to why this feature is lacking, which reiterates what I explained earlier:

Re: [PATCH] ls: adding --zero/-z option, including tests

From: Pádraig Brady Subject: Re: [PATCH] ls: adding --zero/-z option, including tests Date: Mon, 03 Feb 2014 15:27:31 +0000

Thanks a lot for the patch. If we were to do this then this is the interface we would use. However ls is really a tool for direct consumption by a human, and in that case further processing is less useful. For futher processing, find(1) is more suited. That is well described in the first answer at the link above.

So I'd be 70:30 against adding this.

My original answer


This is a bit of my personal opinion but I believe it to be a design decision in leaving that switch out of ls. If you notice the find command does have this switch:

-print0
      True; print the full file name on the standard output, followed by a 
      null character (instead of the newline character that -print uses).  
      This allows file  names  that  contain  newlines or other types of white 
      space to be correctly interpreted by programs that process the find 
      output.  This option corresponds to the -0 option of xargs.

By leaving that switch out, the designers were implying that you should not be using ls output for anything other than human consumption. For downstream processing by other tools, you should be using find instead.

Ways to use find

If you're just looking for the alternative methods you can find them here, titled: Doing it correctly: A quick summary. From that link these are likely the 3 more common patterns:

  1. Simple find -exec; unwieldy if COMMAND is large, and creates 1 process/file:
    find . -exec COMMAND... {} \;
    
  2. Simple find -exec with +, faster if multiple files are okay for COMMAND:
    find . -exec COMMAND... {} \+
    
  3. Use find and xargs with \0 separators

    (nonstandard common extensions -print0 and -0. Works on GNU, *BSDs, busybox)

    find . -print0 | xargs -0 COMMAND
    

Further evidence?

I found this blog post from Joey Hess' blog titled: "ls: the missing options". One of the interesting comments in this post:

The only obvious lack now is a -z option, which should make output filenames be NULL terminated for consuption by other programs. I think this would be easy to write, but I've been extermely busy IRL (moving lots of furniture) and didn't get to it. Any takers to write it?

Further searching I found this in the commit logs from one of the additional switches that Joey's blog post mentions, "new output format -j", so it would seem that the blog post was poking fun at the notion of ever adding a -z switch to ls.

As to the other options, multiple people agree that -e is nearly almost useful, although none of us can quite find a reason to use it. My bug report neglected to mention that ls -eR is very buggy. -j is clearly a joke.

References

slm
  • 369,824
  • 2
    Thank you. I am aware of the caveats. No question about ls output processing is complete without having that pointed out ;-) – Timo Feb 02 '14 at 09:12
  • 1
    @Timo - I know you do, I was doing more for the future readers of this Q. I see you on the site, that these would've come up in your searches by now 8-) – slm Feb 02 '14 at 09:14
  • I realised that, and good that you did. I should have included references to why not (at least not until -0 is implemented) in my question, in order not to lead people astray. – Timo Feb 02 '14 at 09:19
  • Of course, assuming there isn't anything really exotic like a '\n' in a filename, ls -1 | tr '\012' '\000' will list files separated by NULL characters. – samiam Feb 02 '14 at 09:27
  • @samiam - yeah that's the problem. There are all these "legal" characters one can use in filenames that are fine which cause parsing nightmares. I still don't understand why \n is a legal char. – slm Feb 02 '14 at 09:33
  • 3
    This article goes into the depths of filenaming problems: http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html – slm Feb 02 '14 at 09:40
  • BTW, is there any reason to escape the + in the find command? I see that from time to time and always wondered why people do it. – user2719058 Feb 02 '14 at 20:09
  • @user2719058 - escaping it is optional, so no. I did it here since the source material I brought it from had it that way. – slm Feb 02 '14 at 20:48
  • True, as nowadays UTF-8 is the norm, ls(1) doesn't have to live with the strictures of ASCII-only one-letter options. – vonbrand Feb 03 '14 at 01:14
  • A few years ago, I was under a lot of time pressure to make an entire CMS in 2 weeks (I pulled it off, too). The only way I could write usable shell scripts that quickly was to use a special C program to remove anything besides alphanumeric from filenames in directories I manipulated: http://samiam.org/software/sanename.c – samiam Feb 03 '14 at 06:27
  • @slrn: When there is no reason to escape +, why do people do it? I guess there is probably some reason, if maybe only historic - to make it work in some obscure shell that nobody uses these days, or something like that... – user2719058 Feb 03 '14 at 13:19
  • @user2719058 - if you look at specification for find it does not call for escaping of either of these characters. http://pubs.opengroup.org/onlinepubs/009604499/utilities/find.html. So it's the case that at some point find was used on a shell where these were special characters and so required scaping. ; is special to Bash and other shells, + is not. – slm Feb 03 '14 at 13:46
  • Gilles answer to this U&L Q&A might be helpful as well: http://unix.stackexchange.com/questions/8647/gnu-find-and-masking-the-for-some-shells-which – slm Feb 03 '14 at 13:53
  • @slrn: interesting link, although not covering +. Well, maybe people just tend to memorize, "oh, I have to be careful to escape the command terminator", which is usually a semicolon, so they apply it to the plus as well... – user2719058 Feb 03 '14 at 17:01
  • @user2719058 - yes it wasn't intended to directly apply to +, more to highlight the general issues with escaping characters previously that are no longer needed to be. – slm Feb 03 '14 at 17:02
  • The problem with find is it doesn't have a -- option. What are you supposed to do when you want to avoid file names being misinterpreted as options? ls is your only choice... – user541686 Oct 01 '17 at 20:29
  • How would you sort files without ls? See: https://superuser.com/questions/608887/how-can-i-make-find-find-files-in-reverse-chronological-order – Boris Brodski Nov 23 '17 at 09:30
  • 1
    It appears that GNU coreutils' ls will soon have a --zero option: https://fossies.org/linux/coreutils/ChangeLog – Jeff Schaller Oct 20 '21 at 12:38
20

As @slm's answers goes into the origins and possible reasons I won't repeat that here. Such an option is not on the coreutils rejected feature list, but the patch below is now rejected by Pádraig Brady after sending it to the coreutils mailing list. From the answer it is clear this is a philosophical reason ( ls output is for human consumption ).

If you want to try out if such an option is reasonable for yourself, do:

git clone git://git.sv.gnu.org/coreutils
cd coreutils
./bootstrap
./configure
make

then apply the following patch against commit b938b6e289ef78815935ffa705673a6a8b2ee98e dd 2014-01-29:

From 6413d5e2a488ecadb8b988c802fe0a5e5cb7d8f4 Mon Sep 17 00:00:00 2001
From: Anthon van der Neut <address@hidden>
Date: Mon, 3 Feb 2014 15:33:50 +0100
Subject: [PATCH] ls: adding --zero/-z option, including tests

* src/ls.c has the necessary changes to allow -z/--zero option to be
  specified, resulting in a NUL seperated list of files. This
  allows the output of e.g. "ls -rtz" to be piped into other programs

* tests/ls/no-args.sh was extended to test the -z option

* test/ls/rt-zero.sh was added to test both the long and short option
  together with "-t"

This patch was inspired by numerous questions on unix.stackexchange.com
where the output of ls was piped into some other program, invariably
resulting in someone pointing out that is an unsafe practise because of
possible newlines and other characters in the filenames.
---
 src/ls.c            |   31 +++++++++++++++++++++++++------
 tests/ls/no-arg.sh  |    7 ++++++-
 tests/ls/rt-zero.sh |   38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 7 deletions(-)
 create mode 100755 tests/ls/rt-zero.sh

diff --git a/src/ls.c b/src/ls.c
index 5d87dd3..962e6bb 100644
--- a/src/ls.c
+++ b/src/ls.c
@@ -381,6 +381,7 @@ static int file_size_width;
    many_per_line for just names, many per line, sorted vertically.
    horizontal for just names, many per line, sorted horizontally.
    with_commas for just names, many per line, separated by commas.
+   with_zero for just names, one per line, separated by NUL.

-l (and other options that imply -l), -1, -C, -x and -m control

    this parameter.  */
@@ -391,7 +392,8 @@ enum format
     one_per_line,              /* -1 */
     many_per_line,             /* -C */
     horizontal,                        /* -x */
-    with_commas                        /* -m */
+    with_commas,               /* -m */
+    with_zero,                 /* -z */
   };

static enum format format;

@@ -842,6 +844,7 @@ static struct option const long_options[] =
   {"block-size", required_argument, NULL, BLOCK_SIZE_OPTION},
   {"context", no_argument, 0, 'Z'},
   {"author", no_argument, NULL, AUTHOR_OPTION},
+  {"zero", no_argument, NULL, 'z'},
   {GETOPT_HELP_OPTION_DECL},
   {GETOPT_VERSION_OPTION_DECL},
   {NULL, 0, NULL, 0}
@@ -850,12 +853,12 @@ static struct option const long_options[] =
 static char const *const format_args[] =
 {
   "verbose", "long", "commas", "horizontal", "across",
-  "vertical", "single-column", NULL
+  "vertical", "single-column", "zero", NULL
 };
 static enum format const format_types[] =
 {
   long_format, long_format, with_commas, horizontal, horizontal,
-  many_per_line, one_per_line
+  many_per_line, one_per_line, with_zero
 };
 ARGMATCH_VERIFY (format_args, format_types);

@@ -1645,7 +1648,7 @@ decode_switches (int argc, char **argv)

     {
       int oi = -1;
       int c = getopt_long (argc, argv,
-                           "abcdfghiklmnopqrstuvw:xABCDFGHI:LNQRST:UXZ1",
+                           "abcdfghiklmnopqrstuvw:xzABCDFGHI:LNQRST:UXZ1",
                            long_options, &oi);
       if (c == -1)
         break;
@@ -1852,6 +1855,10 @@ decode_switches (int argc, char **argv)
             format = one_per_line;
           break;

+ case 'z':

+          format = with_zero;
+          break;
+
         case AUTHOR_OPTION:
           print_author = true;
           break;
@@ -2607,7 +2614,8 @@ print_dir (char const *name, char const *realname, bool 
command_line_arg)
                  ls uses constant memory while processing the entries of
                  this directory.  Useful when there are many (millions)
                  of entries in a directory.  */
-              if (format == one_per_line && sort_type == sort_none
+              if ((format == one_per_line || format == with_zero)
+                      && sort_type == sort_none
                       && !print_block_size && !recursive)
                 {
                   /* We must call sort_files in spite of
@@ -3598,6 +3606,14 @@ print_current_files (void)
         }
       break;

+ case with_zero:

+      for (i = 0; i < cwd_n_used; i++)
+        {
+          print_file_name_and_frills (sorted_file[i], 0);
+          putchar ('\0');
+        }
+      break;
+
     case many_per_line:
       print_many_per_line ();
       break;
@@ -4490,6 +4506,7 @@ print_many_per_line (void)
           indent (pos + name_length, pos + max_name_length);
           pos += max_name_length;
         }
+      putchar ('X'); // AvdN
       putchar ('\n');
     }
 }
@@ -4780,7 +4797,8 @@ Sort entries alphabetically if none of -cftuvSUX nor 
--sort is specified.\n\
   -F, --classify             append indicator (one of */=>@|) to entries\n\
       --file-type            likewise, except do not append '*'\n\
       --format=WORD          across -x, commas -m, horizontal -x, long -l,\n\
-                               single-column -1, verbose -l, vertical -C\n\
+                               single-column -1, verbose -l, vertical -C,\n\
+                               zeros -z\n\
       --full-time            like -l --time-style=full-iso\n\
 "), stdout);
       fputs (_("\
@@ -4888,6 +4906,7 @@ Sort entries alphabetically if none of -cftuvSUX nor 
--sort is specified.\n\
   -X                         sort alphabetically by entry extension\n\
   -Z, --context              print any security context of each file\n\
   -1                         list one file per line\n\
+  -z, --zero                 list files separated with NUL\n\
 "), stdout);
       fputs (HELP_OPTION_DESCRIPTION, stdout);
       fputs (VERSION_OPTION_DESCRIPTION, stdout);
diff --git a/tests/ls/no-arg.sh b/tests/ls/no-arg.sh
index e356a29..da28b96 100755
--- a/tests/ls/no-arg.sh
+++ b/tests/ls/no-arg.sh
@@ -30,11 +30,16 @@ out
 symlink
 EOF

-

 ls -1 > out || fail=1

compare exp out || fail=1 +/bin/echo -en "dir\00exp\00out\00symlink\00" > exp || framework_failure_

+
+ls --zero > out || fail=1
+
+compare exp out || fail=1
+
 cat > exp <<\EOF
 .:
 dir
diff --git a/tests/ls/rt-zero.sh b/tests/ls/rt-zero.sh
new file mode 100755
index 0000000..cdbd311
--- /dev/null
+++ b/tests/ls/rt-zero.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+# Make sure name is used as secondary key when sorting on mtime or ctime.
+
+# Copyright (C) 1998-2014 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
+print_ver_ ls touch
+
+date=1998-01-15
+
+touch -d "$date" c || framework_failure_
+touch -d "$date" a || framework_failure_
+touch -d "$date" b || framework_failure_
+
+
+ls -zt a b c > out || fail=1
+/bin/echo -en "a\00b\00c\00" > exp
+compare exp out || fail=1
+
+rm -rf out exp
+ls -rt --zero a b c > out || fail=1
+/bin/echo -en "c\00b\00a\00" > exp
+compare exp out || fail=1
+
+Exit $fail
--
1.7.9.5

After another make you can test it with:

  src/ls -rtz | xargs -0 -n1 src/ls -ld

So the patch does work and I can't see a reason why it would not, but that is no proof there is no technical reason to leave out the option. ls -R0 might not make much sense, but neither does ls -Rm which ls can do out of the box.

Anthon
  • 79,293
1

If you have GNU xargs (from coreutils), you can run:

$ ls … |xargs -d "\n" …

Otherwise, after you confirm that your version of xargs supports -0 (most do, but the POSIX spec for xargs doesn't even mention it), you can run:

$ ls … |tr "\n" "\000" |xargs -0 …

My ~/.aliases file, which I have sourced by my ~/.bashrc and my ~/.zshrc and which exists on all systems I use (thus needing broad compatibility), contains this:

# Like `find ... -print0 |xargs -0` for programs that don't have a -print0
xargsn() {  # defer definition until first use
  if xargs --help 2>&1 |grep -Fqw "d, --delimiter"; then  # GNU
    xargsn() { xargs -d "\n" "$@"; }
  else
    # warning, xargs -0 isn't POSIX, but GNU, BSD/OSX, and Busybox support it
    xargsn() { tr "\n" "\000" |xargs -0 "$@"; }
  fi
  xargsn "$@"
}
if command -v compdef >/dev/null 2>&1; then compdef xargsn=xargs; fi # for zsh

This postpones the call to xargs --help until the first time I run xargsn in order to speed up my (rather lengthy) aliases file, meaning it checks for GNU xargs on its first invocation and then saves that from then on. After setting the definition (which is pretty fast, ~0.02 seconds), it then completes that first request as a one-time recursive call.

That last line tells Z-shell to complete xargsn as if it were xargs (since it is).

This xargsn function works for any command so long as you're not iterating over items which may contain line breaks. Warning: file names on many filesystems are technically allowed to contain line breaks. Be sure you don't allow anything catastrophic happen if you have such a filename. (See also the "don't parse ls" arguments.)

Adam Katz
  • 3,965
  • 2
    The whole point of using nul to delimit data is that may include, for example, embedded newlines. Your code seems to convert newlines to nul characters for the sake of having nul characters, rather than just using the newlines as delimiters. Also, the zsh shell has a loadable zargs utility. – Kusalananda Mar 10 '20 at 18:35
  • @Kusalananda – Yes, my code does indeed convert newlines to nulls in order to ease parsing things with xargs. You're right: it's not a complete solution (as I noted) but it is a 99% solution. zargs is intriguing, though I'm not sure it's immune to the command line length issues that tend to push users towards xargs in the first place (the better globbing ability may yield a smaller command line though). – Adam Katz Mar 10 '20 at 18:44