find . -exec cmd {} +
find . -print0 | xargs -0 cmd
Both are meant to be reliable ways to run a command on the files found by find.
Which is preferred? Which is more portable, reliable, efficient, versatile and why?
find . -exec cmd {} +
find . -print0 | xargs -0 cmd
Both are meant to be reliable ways to run a command on the files found by find.
Which is preferred? Which is more portable, reliable, efficient, versatile and why?
There's no clear winner. My recommendation would be to use:
find . -exec cmd {} +
Wherever it's enough as it's more portable, uses less resource and has fewer issues, and one of:
xargs -r0 -other-options -a <(find ... -print0 | ...) cmd
find . -print0 | ... | xargs -0 -other-options cmd
When you need additional features of xargs
or post-process the output of
find
with other tools and you know you're on a system that supports those
non-standard options and the limitations don't apply and/or can be ignored.
find
has had -exec cmd {} ';'
, the variant which runs one invocation of
cmd
per file and also acts as a condition predicate, since its reimplementation with the current interface in
Unix V5 in the mid-70s, but the -exec cmd {} +
form which passes
several files to cmd
, as many as possible, came much later. It was written by
David Korn and first released in System V Release 4 in 1988 (See lynx news://news.gmane.io/gmane.comp.standards.posix.austin.general/2192
) though
not documented until SVR4.2 (1992).
It was only added to the 2001 edition of the POSIX standard and some
implementations of find
only added it much later (4.2.12, in 2005 for GNU
find
, 2002 for FreeBSD, 2006 for NetBSD, 2015 for busybox)
xargs
itself is from PWB Unix from the late 70s. It had (and still has) a very
poor interface with weird and unnecessary features and limitations,
understanding a unique form of quoting (though to be fair close to that
understood by PWB Unix shell which hasn't survived). Though it was meant to
work on the output of find
, it could not do so reliably.
A -0
option was added to GNU xargs
alongside a new -print0
option to GNU
find
in 1990. It's pretty safe to assume the GNU find
authors were not
aware of SysV's -exec {} +
when they added that. Some
-0
/--null
/-z
/--zero
options have been gradually added to other GNU
utilities after that to handle that NUL-delimited interchange format that can carry arbitrary file
paths and more generally arbitrary C strings or command line arguments.
A -d
option to xargs
to allow any single-byte record delimiter, making -0
redundant as it's then just the same as -d '\0'
was added much later to GNU xargs (in 4.2.26 released in late 2005) but to this day and to my knowledge is still only supported by GNU xargs
.
Without those -0
or -d
(and -r
, see below), xargs
is hardly usable
(reliably).
-print0
/-0
have been added since to a few other implementations, even on
some commercial SysV-derived Unixes such as Solaris 11. It is also supported by
the find
builtin of the bosh
shell.
They are not standard but might become so (along with the -d ''
option to
the read
utility) in the next version of the POSIX
standard.
-exec cmd {} +
is standard and now fairly portable. Its support is still
optional in busybox so you may come across Linux-based embedded systems where
it's not available. The -ok cmd {} +
variant to prompt the user before executing cmd
is not standard nor portable (nor would it be convenient as the command lines could end up being huge).
-print0
/xargs -0
is not standard, but it's now commonly found on BSDs and
in the find
/xargs
implementations commonly found on Linux-based systems
including GNU's, busybox' and toybox'. It's still not supported on AIX nor HP/UX.
Outside of GNU systems, it's also still rare
to find other implementations of the other standard utilities (sort
, sed
, cut
, awk
etc.) that support
NUL-delimited records.
find
's -print0
can be implemented standardly with -exec printf '%s\0' {} +
, but there's no standard equivalent for xargs -0
or sort
/sed
/grep
...
-z, and more generally, NULs can't be processed by POSIX text utilities (nor
file paths in general as they are not guaranteed to be text).
Except on some BSDs, find . -print0 | xargs -0 cmd
will still run cmd
once
without arguments if no file is found which is generally not wanted. The GNU
implementation of xargs
added a -r
option to avoid that but it's not as
portable as -0
.
In find . -exec cmd {} +
, cmd
inherits find
's stdin, so cmd
is still able
to interact with the user if that command is started from a terminal for instance.
While in find . -print0 | xargs -r0 cmd
, depending on the xargs
implementation, cmd
's stdin will be either /dev/null (like with GNU xargs
)
or worse will inherit xargs
' stdin, which here is the pipe from find
, so if
it ever reads from its stdin, it will wreak havoc. With the GNU implementation
of xargs
, that can be worked around using -a
and process substitution:
xargs -r0a <(find . -print0) cmd # Korn syntax
xargs -r0a <{find . -print0} cmd # rc syntax
xargs -r0a /dev/fd/3 3<(find . -print0) cmd # yash syntax
xargs -r0a (find . -print0|psub) cmd # fish syntax (not parallel though)
But that's a lot less portable.
On the reliability front, in find . -print0 | xargs -0 cmd
, if find
crashes or is killed early (for instance because it has reached a resource limit), that can have dramatic consequences. That's because find
writes its output in blocks which are not guaranteed to end on a NUL delimiter (for instance, with find /var/tmp -name '*.tmp'
, a block might end with /var
), and xargs
will still make an argument to cmd
for a non-delimited record. So for instance, here could call cmd
(like rm -rf
) with /var
as argument in our example if find
was killed after it has output a block ending in /var
..
That problem doesn't affect -exec cmd {} +
.
With find . -exec cmd {} +
, the exit status reflects both find
and cmd
failures, while with find | xargs
, in most shells you only get the exit status of xargs
so could miss the fact that not all files could be found. Many shells have a pipefail
command to alleviate that though and see also zsh's $pipestatus
or bash's $PIPESTATUS
which ends up giving more flexibility.
The (non-standard) -execdir cmd -- {} +
(note the --
needed with find
implementations that don't add a ./
prefix to the filenames) variant which works around some of the security issues with -exec cmd {} +
or xargs
cannot be done with xargs
.
On the performance front, find | xargs
implies more work (at least one additional process, and the shoving of that data through a pipe), and, because some of it ends up being done in parallel (find
and xargs
running concurrently) could end up adding contention as both find
and cmd
compete for I/O access, so will likely use significantly more resource overall. Thanks to that parallelism, in some situations, it may however end up carrying out the task quicker as find
can carry on searching for more files whilst cmd
is busy doing some CPU intensive tasks with the previous batch (at least until the pipe and find
's internal output buffer are both full).
With find . -exec cmd {} +
, it's easier for cmd
to abort the whole search. For instance, with:
find . -exec sh -c 'if some-condition; then kill -s PIPE "$PPID"; exit 1; fi' sh {} +
With find . -print0 | xargs -0 cmd
, cmd
can do a exit 255
to abort xargs
, but find
won't exit afterwards until it tries to write the next block to the pipe.
The main argument for those is that in the general sense, it is more generic, flexible and versatile.
find
's -print0
output is a post-processable representation of a list of files that can be used by anything, not just xargs -0
. For instance, you can do:
find . -print0 |
grep -z foo |
sort -z
And still get a post-processable, filtered and sorted list of file paths.
And similarly xargs -0
can be used on those NUL-delimited lists whether they come from the output of find
or anything else, whether that represents file paths or anything else.
In that regard, find . -exec cmd {} +
is only for a narrow special use case (even if it's one of the most common ones).
With xargs
, you can use the -n
or -s
option to limit the number of arguments to pass to cmd
. With GNU xargs
, see also the -P
option to run several instances of cmd
in parallel, or the xargs -0 -J {} mv {} /dest/
of some BSDs to allow extra arguments after the list of files.
You can save the output of find . -print0
to a file and process it later (for instance, only if find
was fully successful) with xargs -0 cmd < file
, avoiding the output of cmd
to interfere with the find
result, including with things like (with GNU xargs
):
xargs -r0a =(find . -print0) cmd # zsh
xargs -r0a (find . -print0|psub -f) cmd # fish
There's no equivalent of xargs
's exit 255
special handling with -exec cmd {} +
(though see above about kill "$PPID"
).
With find | xargs
, you can more easily run find
and xargs cmd
in different locales or more generally different environments (including variables, limits, umask...)
For instance, one often needs to run find
in the C locale to work around problems with non-text filenames, but often still want cmd
to run in the user's locale.
LC_ALL=C find . -exec cmd {} +
Runs both find
and cmd
in the C locale. And
LC_ALL=C find . -exec env -u LC_ALL cmd {} +
is first non-standard but also may not restore the original locale for cmd
if LC_ALL
was defined beforehand.
LC_ALL=C find . -print0 | xargs -r0 cmd
Changes the locale to C for find
only.
As a special case:
find . -exec sudo cmd {} +
Often fails to avoid the limit on the size of args+env as sudo
happens to set a SUDO_COMMAND
environment variable which ends up duplicating the list of arguments.
find . -print0 | sudo xargs -r0 cmd
Doesn't have the problem as $SUDO_COMMAND
in that case only contains xargs -0 cmd
(don't use find . -print0 | xargs -r0 sudo cmd
).
See also:
sudo find . -print0 | xargs -r0 cmd
Where the list of files is found by root
, but cmd
runs as the original user. Or
find . -print0 | (USERNAME=some-user; xargs -r0 cmd)
In shells like zsh which have builtin support to change (e)uid, (e)gids.
find . -print0 | xargs -r0 -- "${cmd[@]}"
Works whatever the $cmd
array contains, while
find . -exec "${cmd[@]}" {} +
Fails if any of the elements of the $cmd
array is ;
or there are {}
, +
consecutive elements in there.