25
  • find . -exec cmd {} +
  • find . -print0 | xargs -0 cmd

Both are meant to be reliable ways to run a command on the files found by find.

Which is preferred? Which is more portable, reliable, efficient, versatile and why?

1 Answers1

28

TL;DR

There's no clear winner. My recommendation would be to use:

find . -exec cmd {} +

Wherever it's enough as it's more portable, uses less resource and has fewer issues, and one of:

xargs -r0 -other-options -a <(find ... -print0 | ...) cmd
find . -print0 | ... | xargs -0 -other-options cmd

When you need additional features of xargs or post-process the output of find with other tools and you know you're on a system that supports those non-standard options and the limitations don't apply and/or can be ignored.

History

find has had -exec cmd {} ';', the variant which runs one invocation of cmd per file and also acts as a condition predicate, since its reimplementation with the current interface in Unix V5 in the mid-70s, but the -exec cmd {} + form which passes several files to cmd, as many as possible, came much later. It was written by David Korn and first released in System V Release 4 in 1988 (See lynx news://news.gmane.io/gmane.comp.standards.posix.austin.general/2192) though not documented until SVR4.2 (1992).

It was only added to the 2001 edition of the POSIX standard and some implementations of find only added it much later (4.2.12, in 2005 for GNU find, 2002 for FreeBSD, 2006 for NetBSD, 2015 for busybox)

xargs itself is from PWB Unix from the late 70s. It had (and still has) a very poor interface with weird and unnecessary features and limitations, understanding a unique form of quoting (though to be fair close to that understood by PWB Unix shell which hasn't survived). Though it was meant to work on the output of find, it could not do so reliably.

A -0 option was added to GNU xargs alongside a new -print0 option to GNU find in 1990. It's pretty safe to assume the GNU find authors were not aware of SysV's -exec {} + when they added that. Some -0/--null/-z/--zero options have been gradually added to other GNU utilities after that to handle that NUL-delimited interchange format that can carry arbitrary file paths and more generally arbitrary C strings or command line arguments.

A -d option to xargs to allow any single-byte record delimiter, making -0 redundant as it's then just the same as -d '\0' was added much later to GNU xargs (in 4.2.26 released in late 2005) but to this day and to my knowledge is still only supported by GNU xargs.

Without those -0 or -d (and -r, see below), xargs is hardly usable (reliably).

-print0/-0 have been added since to a few other implementations, even on some commercial SysV-derived Unixes such as Solaris 11. It is also supported by the find builtin of the bosh shell.

They are not standard but might become so (along with the -d '' option to the read utility) in the next version of the POSIX standard.

-exec cmd {} + over xargs -0 cmd

-exec cmd {} + is standard and now fairly portable. Its support is still optional in busybox so you may come across Linux-based embedded systems where it's not available. The -ok cmd {} + variant to prompt the user before executing cmd is not standard nor portable (nor would it be convenient as the command lines could end up being huge).

-print0/xargs -0 is not standard, but it's now commonly found on BSDs and in the find/xargs implementations commonly found on Linux-based systems including GNU's, busybox' and toybox'. It's still not supported on AIX nor HP/UX.

Outside of GNU systems, it's also still rare to find other implementations of the other standard utilities (sort, sed, cut, awk etc.) that support NUL-delimited records.

find's -print0 can be implemented standardly with -exec printf '%s\0' {} +, but there's no standard equivalent for xargs -0 or sort/sed/grep... -z, and more generally, NULs can't be processed by POSIX text utilities (nor file paths in general as they are not guaranteed to be text).

Except on some BSDs, find . -print0 | xargs -0 cmd will still run cmd once without arguments if no file is found which is generally not wanted. The GNU implementation of xargs added a -r option to avoid that but it's not as portable as -0.

In find . -exec cmd {} +, cmd inherits find's stdin, so cmd is still able to interact with the user if that command is started from a terminal for instance.

While in find . -print0 | xargs -r0 cmd, depending on the xargs implementation, cmd's stdin will be either /dev/null (like with GNU xargs) or worse will inherit xargs' stdin, which here is the pipe from find, so if it ever reads from its stdin, it will wreak havoc. With the GNU implementation of xargs, that can be worked around using -a and process substitution:

xargs -r0a <(find . -print0) cmd            # Korn syntax
xargs -r0a <{find . -print0} cmd            # rc syntax
xargs -r0a /dev/fd/3 3<(find . -print0) cmd # yash syntax
xargs -r0a (find . -print0|psub) cmd        # fish syntax (not parallel though)

But that's a lot less portable.

On the reliability front, in find . -print0 | xargs -0 cmd, if find crashes or is killed early (for instance because it has reached a resource limit), that can have dramatic consequences. That's because find writes its output in blocks which are not guaranteed to end on a NUL delimiter (for instance, with find /var/tmp -name '*.tmp', a block might end with /var), and xargs will still make an argument to cmd for a non-delimited record. So for instance, here could call cmd (like rm -rf) with /var as argument in our example if find was killed after it has output a block ending in /var..

That problem doesn't affect -exec cmd {} +.

With find . -exec cmd {} +, the exit status reflects both find and cmd failures, while with find | xargs, in most shells you only get the exit status of xargs so could miss the fact that not all files could be found. Many shells have a pipefail command to alleviate that though and see also zsh's $pipestatus or bash's $PIPESTATUS which ends up giving more flexibility.

The (non-standard) -execdir cmd -- {} + (note the -- needed with find implementations that don't add a ./ prefix to the filenames) variant which works around some of the security issues with -exec cmd {} + or xargs cannot be done with xargs.

On the performance front, find | xargs implies more work (at least one additional process, and the shoving of that data through a pipe), and, because some of it ends up being done in parallel (find and xargs running concurrently) could end up adding contention as both find and cmd compete for I/O access, so will likely use significantly more resource overall. Thanks to that parallelism, in some situations, it may however end up carrying out the task quicker as find can carry on searching for more files whilst cmd is busy doing some CPU intensive tasks with the previous batch (at least until the pipe and find's internal output buffer are both full).

With find . -exec cmd {} +, it's easier for cmd to abort the whole search. For instance, with:

find . -exec sh -c 'if some-condition; then kill -s PIPE "$PPID"; exit 1; fi' sh {} +

With find . -print0 | xargs -0 cmd, cmd can do a exit 255 to abort xargs, but find won't exit afterwards until it tries to write the next block to the pipe.

xargs -0 cmd over -exec cmd {} +

The main argument for those is that in the general sense, it is more generic, flexible and versatile.

find's -print0 output is a post-processable representation of a list of files that can be used by anything, not just xargs -0. For instance, you can do:

find . -print0 |
  grep -z foo |
  sort -z

And still get a post-processable, filtered and sorted list of file paths.

And similarly xargs -0 can be used on those NUL-delimited lists whether they come from the output of find or anything else, whether that represents file paths or anything else.

In that regard, find . -exec cmd {} + is only for a narrow special use case (even if it's one of the most common ones).

With xargs, you can use the -n or -s option to limit the number of arguments to pass to cmd. With GNU xargs, see also the -P option to run several instances of cmd in parallel, or the xargs -0 -J {} mv {} /dest/ of some BSDs to allow extra arguments after the list of files.

You can save the output of find . -print0 to a file and process it later (for instance, only if find was fully successful) with xargs -0 cmd < file, avoiding the output of cmd to interfere with the find result, including with things like (with GNU xargs):

xargs -r0a =(find . -print0) cmd         # zsh
xargs -r0a (find . -print0|psub -f) cmd  # fish

There's no equivalent of xargs's exit 255 special handling with -exec cmd {} + (though see above about kill "$PPID").

With find | xargs, you can more easily run find and xargs cmd in different locales or more generally different environments (including variables, limits, umask...)

For instance, one often needs to run find in the C locale to work around problems with non-text filenames, but often still want cmd to run in the user's locale.

LC_ALL=C find . -exec cmd {} +

Runs both find and cmd in the C locale. And

LC_ALL=C find . -exec env -u LC_ALL cmd {} +

is first non-standard but also may not restore the original locale for cmd if LC_ALL was defined beforehand.

LC_ALL=C find . -print0 | xargs -r0 cmd

Changes the locale to C for find only.

As a special case:

find . -exec sudo cmd {} +

Often fails to avoid the limit on the size of args+env as sudo happens to set a SUDO_COMMAND environment variable which ends up duplicating the list of arguments.

find . -print0 | sudo xargs -r0 cmd

Doesn't have the problem as $SUDO_COMMAND in that case only contains xargs -0 cmd (don't use find . -print0 | xargs -r0 sudo cmd).

See also:

sudo find . -print0 | xargs -r0 cmd

Where the list of files is found by root, but cmd runs as the original user. Or

find . -print0 | (USERNAME=some-user; xargs -r0 cmd)

In shells like zsh which have builtin support to change (e)uid, (e)gids.

find . -print0 | xargs -r0 -- "${cmd[@]}"

Works whatever the $cmd array contains, while

find . -exec "${cmd[@]}" {} +

Fails if any of the elements of the $cmd array is ; or there are {}, + consecutive elements in there.

  • 1
    The important stuff is a little way down after the history section. The history section mainly complains about incompatibility of older systems ( it reads like don't use more than 8.3 filenames, because some people. Back in 1991 the first thing I did on a new system was install the GNU tools, so should not be a problem. ) I nearly did not make it to the good bit. And, the good bit is good. It contains much that should be appreciated. And continues in this way all the way to the end. – ctrl-alt-delor Jan 07 '23 at 18:59