Understand xargs
with a minimal example
Before looking into why xargs is useful, let's first make sure that we understand what xargs
does with some minimal examples.
When you do either of:
printf '1 2 3 4' | xargs rm
printf '1\n2\n3\n4' | xargs rm
xargs parses the input string coming from stdin, and separates arguments by whitespace, somewhat like Bash, though the details are a bit different. In particular, spaces and newlines are treated differently if you use xargs -L
instead of -n
: https://stackoverflow.com/questions/6527004/why-does-xargs-l-yield-the-right-format-while-xargs-n-doesnt/6527308#6527308
Because we are not using -L
however, both of the above calls are equivalent, and xargs would parse out four arguments: 1
, 2
, 3
and 4
.
Then, xargs takes the arguments it parsed out, and feeds them to the program we are calling with. In our case, it is the executable /usr/bin/rm
.
By default, xargs does not specify how many arguments it is going to pass at a time, and unless we pass some flags, and it could be more than one. So the above xargs
calls could be equivalent to either:
rm 1 2 3 4
or:
rm 1 2
rm 3 4
or:
rm 1
rm 2
rm 3
rm 4
and we generally don't know which one of the above happened because for rm
, the end result would be the same: files 1
, 2
, 3
, and 4
would be removed, so we don't care much about which one xargs
is doing anyways, so we just let it do its thing.
It could make a difference for other programs, e.g. /usr/bin/echo
however, where a newline is added for every call.
Control how many arguments are passed at a time
We can control how many arguments are passed at once to xargs
with certain flags.
The simplest one is -n
, which limits the maximum number of arguments to be passed at a time.
Then, we can try to observe what is going on by using /usr/bin/echo
instead of /usr/bin/rm
, because echo
, unlike rm
treats echo 1 2
differently than echo 1; echo 2
as it adds a newline for each call.
With this in mind, if we run:
printf '1 2 3 4' | xargs -n2 echo
it supplies 2 arguments at a time to echo
and is equivalent to:
echo 1 2
echo 3 4
which produces:
1 2
3 4
And if we instead run:
printf '1 2 3 4' | xargs -n1 echo
it supplies 1 argument at a time to echo
and is equivalent to:
echo 1
echo 2
echo 3
echo 4
which produces:
1
2
3
4
Another way is to use -L
instead of -n
. -L
is like -n
but only splits by newlines, not spaces: https://stackoverflow.com/questions/6527004/why-does-xargs-l-yield-the-right-format-while-xargs-n-doesnt/6527308#6527308
And another common way to control the number of arguments is -I
which implies -L1
, e.g.:
printf '1\n2\n3\n4\n' | xargs -I% echo a % b
is equivalent to:
echo a 1 b
echo a 2 b
echo a 3 b
echo a 4 b
and so produces:
a 1 b
a 2 b
a 3 b
a 4 b
Alternative approaches and why xargs
is superior
Now that we understand what xargs
does, let's consider the alternatives and why xargs
is better.
Suppose we have a file:
notes.txt
1
2
3
4
Instead of:
xargs < notes.txt | rm
we might want to use:
rm $(cat notes.txt)
which expands to:
rm 1 2 3 4
However, this is problematic because there is a maximum size for the command line arguments of a Linux program so it could fail if there were too many arguments in notes.txt
.
xargs
knows about this, and automatically splits arguments intelligently to avoid having too many at a time.
And there is no maximum size to streams like stdin, so things can work to arbitrary sizes like this. The reason why it works is that streams can be read little by little with the read()
system call while CLI arguments must be loaded all at once into virtual memory, so there is no need for a hard maximum on stream sizes.
Another simple approach you could try would be:
while IFS="" read -r p || [ -n "$p" ]
do
rm "$p"
done < notes.txt
from: https://stackoverflow.com/questions/1521462/looping-through-the-content-of-a-file-in-bash but this requires a lot of typing, and could be slower because:
- it calls the
/usr/bin/rm
executable once for every argument, rather than fewer times with a bunch of arguments
- more time is spent on the
bash
while
loop, as opposed to the C-coded xargs
code
To make xargs
even more interesting, the GNU version that a -P
option for parallel operation!
Related: https://superuser.com/questions/600253/why-is-xargs-necessary
xargs
and$(...)
), xargs is far safer than command substitution. And I cannot recall ever coming across a legitimate filename with a newline in it. Aren't the escaping and word expansion pitfalls issues with command substitution, not xargs? – camh Nov 19 '11 at 22:31xargs -0
), which is useful in conjunction withfind -print0
. – Ken Bloom Nov 20 '11 at 01:22xargs
call the program via the shell with space separated args, or does it actually construct the argument list internally (eg. for use withexecv
/execp
)? – detly Nov 20 '11 at 09:55-d \n
, although BSD xargs (OSX et al) does not appear to support this option. – fluffy Nov 21 '11 at 00:20/etc/passwd
for example. – Pryftan Oct 31 '22 at 08:56