In a command like this:
find /data ! -type d -exec rm -f {} +
the +
is for batch execution of rm -f
. find
should batch as many arguments as possible. But how does it know the limit?
In a command like this:
find /data ! -type d -exec rm -f {} +
the +
is for batch execution of rm -f
. find
should batch as many arguments as possible. But how does it know the limit?
The limit to find
’s ability to batch arguments, when invoking a command specified by -exec
with +
, is typically determined by the kernel: it’s the maximum size of the arguments given to the exec
family of functions. POSIX defines two ways to discover a value related to this, the maximum size of arguments and environment given to an exec
call.
The first one of these is a constant, which therefore ends up “baked in” to executables when they are built; it’s the ARG_MAX
constant in limits.h
:
Maximum length of argument to the exec functions including environment data.
The second one of these is available at runtime: it involves using the sysconf
function, specifically with the _SC_ARG_MAX
argument.
The limit set by ARG_MAX
(which applies to both approaches described above, since both provide access to the “{ARG_MAX} variable”) is specified by POSIX, with regard to -exec
:
The size of any set of two or more pathnames shall be limited such that execution of the utility does not cause the system's {ARG_MAX} limit to be exceeded.
The same is true of xargs
:
The xargs utility shall limit the command line length such that when the command line is invoked, the combined argument and environment lists (see the exec family of functions in the System Interfaces volume of POSIX.1-2017) shall not exceed {ARG_MAX}-2048 bytes.
Various implementations apply these limits in various ways, sometimes applying smaller values than the above constants would indicate. For example, OpenBSD find
checks sysconf
, to determine the maximum command-line length, but also arbitrarily limits the number of arguments to 5000; see the source code for details (thanks to mosvy for the reference). GNU find
checks sysconf
, and falls back if necessary to ARG_MAX
, or a find
-specified limit; in addition it adds the 2048-byte headroom specified for xargs
(GNU find
and xargs
share their implementation here).
Specific kernels can also add their own twists. What defines the maximum size for a command single argument? discusses this for Linux. Solaris apparently requires different limits to be taken into account depending on whether the spawned process (not the find
or xargs
process, but the future child process) is 32- or 64-bit, because of varying stack requirements; see libfind
for details (thanks to schily for the pointer). The Hurd doesn’t limit arguments at all.
I recently mentioned the general rules here:
Argument list too long error with makefile
A working implementation of this rule is in my own libfind
: https://sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/src/lib/libfind/find.c#l2020
The main problem here is that libfind
needs to know the current environment size and whether the program that is beeing called is a 32 bit or a 64 bit program since there are different limits....
libfind
makes this 32/64 bit distinction because before, I frequently did hit the limit when calling find -name '*.c' -exec count -t {} +
to get the source line count for larger projects when libfind
was used from a 64 bit shell while calling the 32 bit count
program.
The solaris find
implementation does not need to make this distinction since Solaris does not ship a 64 bit find and thus using the 32 bit limit would at least work in any case - even if it does not use the maximum possible arg list size.
BTW: for find
it is unlikely that the unneeded additional limit for a single argument on Linux (128k) applies. For make
this is a real problem since the whole shell command line is passed as a single argument. On the other side, make
does not check in advance as make
does not include code to split long commands.
P.S.:
I just discovered a funny limit on Solaris: both, xargs
and find
from Solaris call their programs via execvp()
from libc
and in case that the program to call is a srcipt without #!
, the execvp()
implementation calls the shell for the script and reorders the the arguments using a fixed size array. Since that array only has 255 entries, both xargs
and find
limit their arguments to 255 in case the command is such a simple shell script. If the programs is such a script and the arglist contains more than 255 arguments, execvp()
would return E2BIG
.
The problem here is: you cannot use malloc()
inside execvp()
since execvp()
may have been called from a process that has been created via vfork()
. If execvp()
would call malloc()
, this would result in dead allocated memory in the parent...calling alloca()
on the other side always succeeds but may lead to a SIGSEGV
in case that the local stack size is exceeded.
xargs
use the same? – Nov 11 '18 at 22:18xargs
uses the same information. – Stephen Kitt Nov 11 '18 at 22:19find
do with that value? How does it know how many arguments to pass? What is_SC_ARG_MAX
a maximum of? Is it the same on every system? Does it maximise that arg length or does it leave extra room? Do all implementations usesysconf()
, do they all consider the environment in their calculation? – Stéphane Chazelas Nov 11 '18 at 22:26xargs
andfind
will also clamp the argument size to something reasonable. See here about gnu xargs; but other implementation do that clamping too. – Nov 11 '18 at 22:55args
andfind
implementation. Without looking at every implementation, I cannot tell whether all implementations check for all possibilities.libfind
e.g. checks whether the program being called is 32 or 64 bit and takes the different limits into account. Do other implementations the same? – schily Nov 12 '18 at 09:29xargs
only uses 255 arguments at max. So this will never hit theARG_MAX
limit. See what I am going to add to my answer soon. – schily Nov 12 '18 at 10:23-exec
is not able to run executable scripts without shebangs. Please do not presume about what I may or may not thought -- I'm not sure about that myself ;-) – Nov 12 '18 at 11:25libfind
to support those rare simple shell scripts (I so far did never see a related problem), since it does not useexecvp()
but ratherfexecve()
. I would just need to leave room for a "sh" argument in my argument array. – schily Nov 12 '18 at 11:40libfind
version (available to the public in a few days) will include support for simple shell scripts (without#!
) and it will do this using the fullARG_MAX
size. – schily Nov 12 '18 at 15:37