How does the find command know how many arguments to feed with "-exec ... {} +"?

Question

In a command like this:

find /data ! -type d -exec rm -f {} +

the + is for batch execution of rm -f. find should batch as many arguments as possible. But how does it know the limit?

Stephen Kitt · Accepted Answer · 2018-11-12T11:35:00.383

3

The limit to find’s ability to batch arguments, when invoking a command specified by -exec with +, is typically determined by the kernel: it’s the maximum size of the arguments given to the exec family of functions. POSIX defines two ways to discover a value related to this, the maximum size of arguments and environment given to an exec call.

The first one of these is a constant, which therefore ends up “baked in” to executables when they are built; it’s the ARG_MAX constant in limits.h:

Maximum length of argument to the exec functions including environment data.

The second one of these is available at runtime: it involves using the sysconf function, specifically with the _SC_ARG_MAX argument.

The limit set by ARG_MAX (which applies to both approaches described above, since both provide access to the “{ARG_MAX} variable”) is specified by POSIX, with regard to -exec:

The size of any set of two or more pathnames shall be limited such that execution of the utility does not cause the system's {ARG_MAX} limit to be exceeded.

The same is true of xargs:

The xargs utility shall limit the command line length such that when the command line is invoked, the combined argument and environment lists (see the exec family of functions in the System Interfaces volume of POSIX.1-2017) shall not exceed {ARG_MAX}-2048 bytes.

Various implementations apply these limits in various ways, sometimes applying smaller values than the above constants would indicate. For example, OpenBSD find checks sysconf, to determine the maximum command-line length, but also arbitrarily limits the number of arguments to 5000; see the source code for details (thanks to mosvy for the reference). GNU find checks sysconf, and falls back if necessary to ARG_MAX, or a find-specified limit; in addition it adds the 2048-byte headroom specified for xargs (GNU find and xargs share their implementation here).

Specific kernels can also add their own twists. What defines the maximum size for a command single argument? discusses this for Linux. Solaris apparently requires different limits to be taken into account depending on whether the spawned process (not the find or xargs process, but the future child process) is 32- or 64-bit, because of varying stack requirements; see libfind for details (thanks to schily for the pointer). The Hurd doesn’t limit arguments at all.

edited Nov 12 '18 at 11:35

answered Nov 11 '18 at 22:16

Stephen Kitt

434,908

Does xargs use the same? – Nov 11 '18 at 22:18
Yes, xargs uses the same information. – Stephen Kitt Nov 11 '18 at 22:19
2

But then, what does find do with that value? How does it know how many arguments to pass? What is _SC_ARG_MAX a maximum of? Is it the same on every system? Does it maximise that arg length or does it leave extra room? Do all implementations use sysconf(), do they all consider the environment in their calculation? – Stéphane Chazelas Nov 11 '18 at 22:26
Both xargs and find will also clamp the argument size to something reasonable. See here about gnu xargs; but other implementation do that clamping too. – Nov 11 '18 at 22:55
On OpenBSD for instance, the number of arguments will be limited to 5000. – Nov 11 '18 at 23:02
1

@mosvy Giving it would have been better if yourfirst sentence did not start in a way that makes people believe that there is only one args and find implementation. Without looking at every implementation, I cannot tell whether all implementations check for all possibilities. libfind e.g. checks whether the program being called is 32 or 64 bit and takes the different limits into account. Do other implementations the same? – schily Nov 12 '18 at 09:29
@schily ok change 'will' to 'may' in my comment; but if eg. solaris' xargs and find aren't doing any clamping, it would be better to state it directly. fwiw afaik gnu xargs and find do not care if the program being called is 32 or 64 bit. – Nov 12 '18 at 10:04
@mosvy The funny thing is that Solaris xargs only uses 255 arguments at max. So this will never hit the ARG_MAX limit. See what I am going to add to my answer soon. – schily Nov 12 '18 at 10:23
@mosvy ready with my new text. You see, things are much more complex that you might have thought before. – schily Nov 12 '18 at 10:52
@schily so, a) solaris's find and xargs do the clamping just like the gnu and bsd implementations, b) libfind's find is incompatible with other find implementations, because its -exec is not able to run executable scripts without shebangs. Please do not presume about what I may or may not thought -- I'm not sure about that myself ;-) – Nov 12 '18 at 11:25
Did you verify that gnu find has such a low limit? It is simple to change libfind to support those rare simple shell scripts (I so far did never see a related problem), since it does not use execvp() but rather fexecve(). I would just need to leave room for a "sh" argument in my argument array. – schily Nov 12 '18 at 11:40
@mosvy the next libfind version (available to the public in a few days) will include support for simple shell scripts (without #!) and it will do this using the full ARG_MAX size. – schily Nov 12 '18 at 15:37

score -1 · Answer 2 · edited Nov 29 '18 at 12:57

I recently mentioned the general rules here:

Argument list too long error with makefile

A working implementation of this rule is in my own libfind: https://sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/src/lib/libfind/find.c#l2020

The main problem here is that libfind needs to know the current environment size and whether the program that is beeing called is a 32 bit or a 64 bit program since there are different limits....

libfind makes this 32/64 bit distinction because before, I frequently did hit the limit when calling find -name '*.c' -exec count -t {} + to get the source line count for larger projects when libfind was used from a 64 bit shell while calling the 32 bit count program.

The solaris find implementation does not need to make this distinction since Solaris does not ship a 64 bit find and thus using the 32 bit limit would at least work in any case - even if it does not use the maximum possible arg list size.

BTW: for find it is unlikely that the unneeded additional limit for a single argument on Linux (128k) applies. For make this is a real problem since the whole shell command line is passed as a single argument. On the other side, make does not check in advance as make does not include code to split long commands.

P.S.: I just discovered a funny limit on Solaris: both, xargs and find from Solaris call their programs via execvp() from libc and in case that the program to call is a srcipt without #!, the execvp() implementation calls the shell for the script and reorders the the arguments using a fixed size array. Since that array only has 255 entries, both xargs and find limit their arguments to 255 in case the command is such a simple shell script. If the programs is such a script and the arglist contains more than 255 arguments, execvp() would return E2BIG.

The problem here is: you cannot use malloc() inside execvp() since execvp() may have been called from a process that has been created via vfork(). If execvp() would call malloc(), this would result in dead allocated memory in the parent...calling alloca() on the other side always succeeds but may lead to a SIGSEGV in case that the local stack size is exceeded.

How does the find command know how many arguments to feed with "-exec ... {} +"?

2 Answers2

Linked