If you have GNU find
(the one that supports -printf
).
find /filesystem/mount/point -xdev -printf '%T@\t%p\0' > timestamps
is going to be the fastest. find
is highly optimised to traverse directory trees, and then it does that lstat()
system calls itself to retrieve the timestamps. It will also call lstat()
on paths relative to the directory where it finds them which means less work to do for the kernel than if lstat()
was called on the full path.
With %T@
which prints the timestamp as decimal epoch time, all it has to do is convert the numbers (second and nanosecond) from binary to decimal which is a lot less effort than %T+
which needs to compute the calendar time in the user's timezone.
There are many different and incompatible implementations of a stat
command, but none of them find files, they just do some stat()
/lstat()
/statx()
/statfs()
or equivalent to retrieve metadata information from the files whose paths are given as arguments, so you need something else to find the files and pass their full paths to stat
.
Because on most systems, commands can only take a limited number of arguments, that means you'll likely need to call the stat
utility several times, each in its own process, each having to be loaded, initialise, process its arguments, etc.
One exception is the stat
builtin of zsh
which does predate GNU or BSD stat
(though not GNU find
's -printf
).
zsh
can find the files with its recursive globs so can do the whole process without having to run another command, but is never going to be as efficient as find
.
Note that date -r
(also a GNU non-standard extension) does a stat()
or equivalent, not lstat()
. So for symlinks, it reports the timestamp of the target (or fails if the link can't be resolved), not that of the symlink. Among the various stat
implementations, some use stat()
, some use lstat()
by default but all can be told to switch between the two.
To optimise it further, you could implement it in C, do your directory traversal by hand without some of the extra safeguards that find
implements. On recent versions of Linux, using statx()
which can be told to retrieve less information might help.
If you have locate
/mlocate
/plocate
, using its cached list of file would save you having to crawl the file system and might help speed up the process (at the risk of giving you stale information).
Since version 4.9, GNU find
can be passed the list of files to process from stdin with -files0-from -
, so you can do:
LC_ALL=C locate -0 '/filesystem/mount/point/*' |
find -files0-from - -prune -printf '%T@\t%p\0' > timestamps
That would be more efficient than using something like | xargs -r0 stat --printf '%.9Y\t%n\0' --
(here assuming GNU stat
and that none of the input filepaths is -
) which would still run several invocations of stat
.
You can use that same approach if you have a list of file paths stored as NUL-delimited records in a file. If in another format, you'd need to convert it first. For instance, for a text file containing one path per line (which means you can't store file paths that contain newline characters), you'd do tr '\n' '\0' < list.txt | find...
.
In my test here, it's still less efficient than letting find
find the files by itself, possibly because find
ends up calling lstat()
on full paths which means the kernel has to do the full look-up for every file.
Also note that it won't be able to cope with file paths longer than PATH_MAX
(usually around 4KiB on Linux, see the output of getconf PATH_MAX /mount/point
).
In any case, for performance, the last thing you want to do is run an external utility such as GNU date
or GNU stat
for each file, like in a shell loop. If for some reason, you needed to process files and their timestamp in a loop in a shell such as bash
that doesn't have a stat
builtin, you'd do something like:
while IFS=/ read -u3 -rd '' timestamp filepath; do
something with "$timestamp" and "$filepath"
done 3< <(find /filesystem/mount/point -xdev -printf '%T@/%p\0')
We use /
as the separator as that's the only character that is guaranteed not to occur at the end of a filepath. An exception to that would be for the directory that you pass to find
. For instance, in the output of find / -xdev -printf '%T@/%p\0'
, the first record (and the first only) would end in /
. It would contain <timestamp>//
, and read
would store the empty string instead of /
in $filepath
. You could work around that by using zsh
instead of bash
(where $IFS
is truly considered as an internal field separator and not delimiter) or use ${filepath:-/}
when referencing the filepath.
Note that the read
itself if quite inefficient as it needs to read the input one byte at a time. See Why is using a shell loop to process text considered bad practice? for more details on that. It's likely you'd be better of using a proper programming language if performance is a concern.
Shells with builtin support for retrieving the modification time of a file (and avoid the prohibitive cost of running a separate utility for each file) that I know are tcsh
, zsh
, ksh93
and busybox sh
.
tcsh
is not really usable for scripting.
For ksh93, you need it to have been built with the date
or ls
builtins included which is rarely the case. And for busybox, while its sh
applet can invoke its stat
applet without reexecuting itself, it still does it in child process and forking a process is quite expensive. Busybox stat
(with a similar API as GNU stat
) also doesn't support subsecond precision¹. Also, neither busybox sh
nor ksh93
can process NUL-delimited records.
With zsh
with the list
file containing the filepaths NUL-delimited:
zmodload zsh/stat || exit
for filepath (${(0)"$(<list)"})
stat -LF %s.%9. -A timestamp +mtime -- $filepath &&
something with $filepath and $timestamp
For a list
that contains one (newline-free) filepath per line, replace (0)
with (f)
.
With ksh93
with its builtin ls
and list
with one filepath per line:
builtin ls || exit
while IFS= read -ru3 filepath; do
timestamp=${ ls -dZ '%(mtime:%s.%N)s' -- "$filepath"; } &&
something with "$filepath" and "$timestamp"
done 3< list
You can also use builtin date; date -f %s.%N -m -- "$filepath"
there but beware it does a stat()
(as if passing -L
to ls
), not lstat()
.
¹ Its date
applet can be configured at build time to support nanosecond precision though it's not enabled in its default build
xargs stat --printf='%Y %n\n' <list
? – Kusalananda Apr 02 '22 at 10:47stat
, or if all the ways to do it are basically the same. I tried three approaches in my answer below, but would love to know if there are more I could try. – Jun-Dai Bates-Kobashigawa Apr 02 '22 at 11:29