Most robust way to list every basename in a directory, sorted by modification date?

Question

Given a directory containing:

note 1.txt, last modified yesterday
note 2.txt, last modified the day before yesterday
note 3.txt, last modified today

What is the best way to fetch the array note 3 note 1 note 2?

To define "best," I'm more concerned about robustness (in the context of Zsh in macOS) than I am about efficiency and portability.

The intended use case is a directory of hundreds or thousands of plain text files, but—at the risk of muddling the question—this is a specific case of a more general question I have, of what best practices are in performing string manipulations on filepaths printed by commands like ls, find, and mdfind.

I've been using a macro which invokes this command to achieve the above:

ls -t | sed -e 's/.[^.]*$//'

It's never failed, but:

Greg's Wiki strongly recommends against parsing the output of ls. (Parsing ls; Practices, under "5. Don't Ever Do These").
Is invoking sed inefficient where parameter expansion would do?

Using find (safely delimiting filepaths with NUL characters rather than newlines), and parameter expansion to extract the basenames, this produces an unsorted list:

find . -type f -print0 | while IFS= read -d '' -r l ; do print "${${l%.*}##*/}" ; done

But sorting by modification date would seem to require invoking stat and sort, because macOS's find lacks the -printf flag which might otherwise serve well.

Finally, using Zsh's glob qualifiers:

for f in *(om) ; do print "${f%.*}" ; done

Though not portable, this last method seems most robust and efficient to me. Is this correct, and is there any reason I shouldn't use a modified version of the find command above when I'm actually performing a search rather than simply listing files in a directory?

TBH, when in need of such advanced order manipulation, I'd rather write a short Python script. The standard Python libraries already contain all necessary functionalities, and you can easily add more features. — pepoluan, Jul 25 '20 at 16:17

Stéphane Chazelas · Accepted Answer · 2020-07-25T12:15:33.047

In zsh,

list=(*(Nom:r))

Is definitely the most robust.

print -rC1 -- *(Nom:r)

to print them one per line, or

print -rNC1 -- *(Nom:r)

as NUL-delimited records to be able to do anything with that output since NUL is the only character not allowed in a file path.

Change to *(N-om:r) if you want the modification time to be considered after symlink resolution (mtime of the target instead of the symlink like with ls -Lt).

:r (for root name) is the history modifier (from csh) to remove the extension. Beware that it turns .bashrc into the empty string which would only be a concern here if you enabled the dotglob option.

Change to **/*(N-om:t:r) to do it recursively (:t for the tail (basename), that is, to remove the directory components).

Doing it reliably for arbitrary file names with ls is going to be very painful.

One approach could be to run ls -td -- ./* (assuming the list of file names fits in the arg list limit) and parse that output, relying on the fact that each file names starts with ./, and generate either a NUL-delimited list or a shell-quoted list to pass it to the shell, but doing that portably is also very painful unless you resort to perl or python.

But if you can rely on perl or python being there, you would be able to have them generate and sort the list of files and output it NUL-delimited (though possibly not that easily portably if you want to support sub-second precision).

ls -t | sed -e 's/.[^.]*$//'

Would not work properly for filenames that contain newline characters (IIRC some versions of macOS did ship with such filenames in /etc by default). It could also fail for file names that contain sequence of bytes not forming valid characters as . or [^.] could fail to match on them. It may not apply to macOS though, and could be fixed by setting the locale to C/POSIX for sed.

The . should be escaped (s/\.[^.]*$//) as it's the regexp operator that matches any character as otherwise, it turns dot-less files like foobar into empty strings.

Note that to print a string raw, it's:

print -r -- "$string"

~~print "$string"~~ would fail for values of $string that start with -, even introducing a command injection vulnerability (try for instance with string='-va[$(uname>&2)1]', here using a harmless uname command). And would mangle values that contain \ characters.

Your:

find . -type f -print0 | while IFS= read -d '' -r l ; do print "${${l%.*}##*/}" ; done

Also has an issue in that you strip the .* before removing the directory components. So for instance a ./foo.d/bar would become foo instead of bar and ./foo would become the empty string.

About safe ways to process the find output in various shells, see Why is looping over find's output bad practice?

score 2 · Answer 2 · answered Jul 27 '20 at 00:56

2

IMNSHO robustness and shell scripts are incompatible concepts (IFS is just a hack, sorry). I think there are only two ways to do what you want in a robust manner: either write a program in some sane language (Python, C, whatever) or use tools built specifically for robustness.

With csv-nix-tools (*) you can achieve this with:

csv-ls -c name,mtime_sec,mtime_nsec | 
csv-sort -c mtime_sec,mtime_nsec | 
csv-cut -c name |
csv-add-split -c name -e . -n base,ext -r | 
csv-cut -c base |
csv-header --remove

Rather self-explanatory.

If you want to just see the basenames of files, that would be enough, but usually, you want to do something useful with the data you just got. That's where sink tools are useful. Currently, there are 3: csv-exec (executes a command for each row), csv-show (formats data in human-readable form), and csv-plot (generates 2D or 3D graph using gnuplot).

There are still some rough edges here and there, but these tools are good enough to start playing with them.

(*) https://github.com/mslusarz/csv-nix-tools

answered Jul 27 '20 at 00:56

Marcin Ślusarz

41

Sounds like a great idea of a project. Has it been ported to the OP's macOS or any OS other than GNU/Linux ones? – Stéphane Chazelas Jul 27 '20 at 06:55
It looks like all the operators at this point work at byte-level instead of character level, which makes it limited to operate on non-ASCII data. Have you got any plan to support at least UTF-8 encoded textual data? – Stéphane Chazelas Jul 27 '20 at 06:57
It seems that toolset effectively reimplements a subset of the Unix basic utilities (or at least their functionality) to work with csv records instead of lines. Note that the OP is using the zsh shell which also implements some of that functionality in its parameter expansion operators, and can cope with both text and binary data (it looks like your csv-nix-tools can't cope with NUL characters) which allows one to write more robust code. – Stéphane Chazelas Jul 27 '20 at 07:06
No, it has not been ported to other OSes. Feel free to make a pull request with portability (or any other) patches. – Marcin Ślusarz Jul 27 '20 at 21:00
I haven't tested it extensively, but these tools should work fine with utf-8, because utf-8 is defined in such a way that non-ascii bytes do not collide with ascii bytes. If you can find cases where they don't work, please open an issue and I'll happily fix such bugs. I'm adding TODO entry to add utf-8 tests. – Marcin Ślusarz Jul 27 '20 at 21:00
Yeah, I realized that NUL bytes don't work just recently. I'm adding this to TODO. WRT "reimplementing Unix utils to work with CSV": that's the whole point - to allow processing structured data, without (for example) the silliness of accidentally matching data from other columns than intended. – Marcin Ślusarz Jul 27 '20 at 21:01
I meant things like printf 'name:string\n"Stéphane"\n' | csv-add-rev -c name -n rev returning "Stéphane","enahp��tS" instead of "Stéphane","enahpétS" or printf 'name:string\n"Stéphane"\n' | csv-grep -c name -E 'St.phane' not matching for instance, i.e. it's working at byte level, not character level. – Stéphane Chazelas Jul 28 '20 at 04:08
Even in locales using single-byte characters like en_GB.iso885915, it doesn't seem to honour the locale. In that locale printf 'name:string\n"Stéphane"\n' | csv-grep -c name -E 'St.phane' matches but printf 'name:string\n"Stéphane"\n' | csv-grep -c name -E 'St[[:alpha:]]phane' doesn't. – Stéphane Chazelas Jul 28 '20 at 04:12
Very useful examples. I'll fix them. Thank you. – Marcin Ślusarz Jul 28 '20 at 17:33
csv-grep and csv-add-rev are already fixed. I'm pretty sure other tools have similar problems. I'll definitely try to find them myself, but if you find more, please let me know (either here or as an issue on GitHub). – Marcin Ślusarz Jul 30 '20 at 00:50

score 0 · Answer 3 · answered Jul 25 '20 at 22:24

0

An alternate approach I was surprised not to see already covered, which will work on any shell adopting quite widespread ksh extensions (including both bash and zsh), on a system with GNU tools:

while IFS= read -r -d ' ' time && IFS= read -r -d '' filename; do
  printf 'Filename %q, with epoch time %s\n' "$filename" "$time"
done < <(find . -mindepth 1 -maxdepth 1 -printf '%T@ %P\0' | sort -gz)

Explaining how it works:

The find format string %T@ %P\0 prints, for each file, a decimal timestamp (optionally with subsecond precision), a space, the basename of that file, and then a NUL.
In sort -gz, -g is a generalized sort that correctly handles floating-point numeric values; and -z expects NULs rather than newlines as delimiters.
In IFS= read -r -d ' ' time && IFS= read -r -d '' filename, we terminate the read of the time at the first space; whereas we terminate the read of the filename at the first NUL.
In printing the results with format string %q, we convert even nonprintable characters (tabs, newlines, carriage returns, etc) in filenames into readable text.

answered Jul 25 '20 at 22:24

Charles Duffy

1,732
15
22

While read -d indeed comes from ksh93, read -d '' to read NUL-delimited records doesn't work there. Only bash and zsh. Process substitution is also only AT&T ksh/bash/zsh, but you need recent versions of ksh93 to be able to redirect from them. So that's not really widespread, only a small minority of POSIX-like shells. – Stéphane Chazelas Jul 27 '20 at 06:31
You're using GNU find and its -printf which the OP said they didn't have access to (and already linked to solutions using it). (and GNU sort). – Stéphane Chazelas Jul 27 '20 at 06:32
Yes, I call out those dependencies very explicitly in the first paragraph. Not everyone using this answer will be the OP. – Charles Duffy Jul 27 '20 at 14:31
Yes, but those find -printf | sort -z approaches are covered in a number of Q&A already here already. I see more this question is about doing it without GNU tools but when you have access to zsh. – Stéphane Chazelas Jul 27 '20 at 14:43

Most robust way to list every basename in a directory, sorted by modification date?

3 Answers3