What is costly, is doing system calls on the files (for the system calls themselves and for the I/O).
Things like -type
, -mtime
require a lstat(2)
system call on the file. -name
, -path
, -regex
don't (though of course it will have done system calls to the directories they contain to read their content).
Usually, find
does an lstat()
anyway (because it needs to know whether a file is a directory or not to descend into it, unless that information is provided in the readdir()
), but there are cases where it can do without it. For instance, if the number of links of a directory is less then 3, then in some filesystems, find
knows it doesn't have subdirs, and some find
implementations will optimise by not doing lstat
s in there.
-xtype
will cause a stat(2)
, -printf ...
, -ls
may cause a stat()
, lstat()
, readlink()
, -lname
a lstat()
and readlink()
.
That's why you may want to put the -name
/-path
/-regex
... first. If they can rule out a file, they can avoid one or more syscalls.
Now, a -regex
may be more expensive than a -name
, but I'm not too sure you'd get much by swapping them.
Also note that some find
implementations like GNU find
do reorder the checks by default when possible. See:
info find 'Optimisation Options'
on a GNU system (there on gnu.org for the latest version of GNU findutils
).
Typically, if you did your tests on a GNU system, both commands would do the same thing because find
would have moved the -name
forward anyway.
So, for the -type d -name ...
vs -name ... -type d
to make a difference, you need a find
implementation that doesn't optimise by reordering those predicates and one that does some optimisation by not doing an lstat()
on every file.
Where there will be a (huge) difference regardless of the implementation is in:
find . -name 'x*' -exec test -d {} \; -print
vs:
find . -exec test -d {} \; -name 'x*' -print
find
can't reorder the -exec
as doing so could introduce functional differences (find
can't know whether the command that is executed is only for testing or does something else).
And of course -exec ... {} \;
is several orders of magnitude more expensive than any other predicate since it means forking a process and execute a command in it (itself running many system calls) and wait for it and its exit code.
$ time find /usr/lib -exec test -d {} \; -name z\* -print > /dev/null
1.03s user 12.52s system 21% cpu 1:03.43 total
$ time find /usr/lib -name z\* -exec test -d {} \; -print > /dev/null
0.09s user 0.14s system 62% cpu 0.367 total
(the first one calls test
for every file in /usr/lib
(56685), the second one only on those files whose name starts with z
(147)).
Note that -exec test -d {} \;
is not the same as -type d
. It's the portable equivalent of the GNU specific -xtype d
.
-name ..
and-type d
would matter. Your testing would seem to indicate that it doesn't. – slm Nov 15 '13 at 15:20find
is triggering which should lead you to a more accurate picture of what files are being "touched" whenfind
is running. – slm Nov 15 '13 at 15:52fatrace
in action on this Q&A: http://unix.stackexchange.com/questions/86875/determining-specific-file-responsible-for-high-i-o/87290#87290 – slm Nov 15 '13 at 15:57