When is xargs needed?

Question

The xargs command always confuses me. Is there a general rule for it?

Consider the two examples below:

$ \ls | grep Cases | less

prints the files that match 'Cases', but changing the command to touch will require xargs:

$ \ls | grep Cases | touch
touch: missing file operand
Try `touch --help' for more information.

$ \ls | grep Cases | xargs touch

score 152 · Accepted Answer · edited Aug 09 '12 at 19:33

152

The difference is in what data the target program is accepting.

If you just use a pipe, it receives data on STDIN (the standard input stream) as a raw pile of data that it can sort through one line at a time. However some programs don't accept their commands on standard in, they expect it to be spelled out in the arguments to the command. For example touch takes a file name as a parameter on the command line like so: touch file1.txt.

If you have a program that outputs filenames on standard out and want to use them as arguments to touch, you have to use xargs which reads the STDIN stream data and converts each line into space separated arguments to the command.

These two things are equivalent:

# touch file1.txt
# echo file1.txt | xargs touch

Don't use xargs unless you know exactly what it's doing and why it's needed. It's quite often the case that there is a better way to do the job than using xargs to force the conversion. The conversion process is also fraught with potential pitfalls like escaping and word expansion etc.

edited Aug 09 '12 at 19:33

Philippe Blayo

1,959

answered Nov 19 '11 at 16:40

Caleb

70,105

2

The warning feels a little string to me. Of the two common options to get a stream onto a command line (xargs and $(...)), xargs is far safer than command substitution. And I cannot recall ever coming across a legitimate filename with a newline in it. Aren't the escaping and word expansion pitfalls issues with command substitution, not xargs? – camh Nov 19 '11 at 22:31
6

@camh: They're potential pitfalls with both. In the shell, you have to worry about filenames getting split on spaces, tabs, and newlines. In xargs, you only have to worry about newlines. In xargs, if you're output is formatted properly, you can split words/filenames on the NUL character instead (xargs -0), which is useful in conjunction with find -print0. – Ken Bloom Nov 20 '11 at 01:22
1

Does xargs call the program via the shell with space separated args, or does it actually construct the argument list internally (eg. for use with execv/execp)? – detly Nov 20 '11 at 09:55
1

It constructs it internally and uses execvp, so it's safe. Also, GNU xargs (as used on Linux and a few others) lets you specify newline as your delimiter with -d \n, although BSD xargs (OSX et al) does not appear to support this option. – fluffy Nov 21 '11 at 00:20
@camh That attitude of 'And I cannot recall ever coming across a legitimate filename with a newline in it' is incredibly dangerous because there are such a thing as bogus file names and they can be a serious risk. You could even inadvertently wipe out /etc/passwd for example. – Pryftan Oct 31 '22 at 08:56

amphetamachine · Answer 2 · 2017-04-03T20:32:33.337

72

To expand on the answers already provided, xargs can do one cool thing that is becoming increasingly important in today's multicore and distributed computing landscape: it can parallel process jobs.

For example:

$ find . -type f -name '*.wav' -print0 |xargs -0 -P 3 -n 1 flac -V8

will encode *.wav => *.flac, using three processes at once (-P 3).

edited Apr 03 '17 at 20:32

answered Nov 19 '11 at 21:02

amphetamachine

5,517
2
35
43

Wow. I should have known this a week ago when I was doing exactly the same thing (except using OGG) with 50GiB of WAVs. :) – Alois Mahdal May 13 '12 at 21:38
why not use the -exec parameter that find has? – Evgeny Zislis Sep 11 '12 at 09:46
5

@Evgeny The -exec parameter won't parallel-process jobs. – amphetamachine Sep 18 '12 at 12:56
1

Good to note that the -0 argument to xargs makes it consider the NULL character to be the input item delimiter. find -print0 output NULL-delimited items. This is great practice for filenames that may contain spaces, quotes, or other special characters. – Dan Dascalescu Jan 06 '19 at 09:02

Sverre Rabbelier · Answer 3 · 2011-11-19T21:00:07.500

xargs is particularly useful when you have a list of filepaths on stdin and want to do something with them. For example:

$ git ls-files "*.tex" | xargs -n 1 sed -i "s/color/colour/g"

Let's examine this step by step:

$ git ls-files "*.tex"
tex/ch1/intro.tex
tex/ch1/motivation.tex
....

In other words, our input is a list of paths that we want to do something to.

To find out what xargs does with these paths, a nice trick is to add echo before your command, like so:

$ git ls-files "*.tex" | xargs -n 1 echo sed -i "s/color/colour/g"
sed -i "s/color/colour/g" tex/ch1/intro.tex
sed -i "s/color/colour/g" tex/ch1/motivation.tex
....

The -n 1 argument will make xargs turn each line into a command of its own. The sed -i "s/color/colour/g" command will replace all occurrences of color with colour for the specified file.

Note that this only works if you don't have any spaces in your paths. If you do, you should use null terminated paths as input to xargs by passing the -0 flag. An example usage would be:

$ git ls-files -z "*.tex" | xargs -0 -n 1 sed -i "s/color/colour/g"

Which does the same as what we described above, but also works if one of the paths has a space in it.

This works with any command that produces filenames as output such as find or locate. If you do happen to use it in a git repository with a lot of files though, it might be more efficient to use it with git grep -l instead of git ls-files, like so:

$ git grep -l "color" "*.tex" | xargs -n 1 sed -i "s/color/colour/g"

The git grep -l "color" "*.tex" command will give a list of "*.tex" files containing the phrase "color".

True, but if you've learned this you should also learn Why is looping over find's output bad practice? — Wildcard, Nov 16 '16 at 05:32

score 9 · Answer 4 · answered Nov 20 '11 at 00:45

Your first argument illustrates the difference quite well.

\ls | grep Cases | less lets you browse the list of file names produced by ls and grep. It doesn't matter that they happen to be file names, they're just some text.

\ls | grep Cases | xargs less lets you browse the files whose names are produced by the first part of the command. xargs takes a list of file names as input and a command on its command line, and runs the command with the file names on its command line.

When considering using xargs, keep in mind that it expects input formatted in a strange way: whitespace-delimited, with \, ' and " used for quoting (in an unusual way, because \ isn't special inside quotes). Only use xargs if you your file names don't contain whitespace or \'".

@Gilles: xargs has the -0, --null option to get around the spaces issue (it's highly likely I learnt that from you :), so I assume that you are referring to a no-options xarg call, but I'm puzzled by your reference to the quotes. Do you have a link or an example regarding that? .. (ps. | xargs less is a handy "trick" +1.. thanks.. — Peter.O, Dec 01 '11 at 20:59

score 5 · Answer 5 · answered Nov 23 '11 at 15:36

In your example you don't need to use xargs at all since find will do exactly and safely what you want to do.

Exactly what you want using find is:

find -maxdepth 1 -name '*Cases*' -exec touch {} +

In this example -maxdepth 1 means only search in the current directory, don't descend into any subdirectories; by default find will look in all subdirectories (which is often what you want) unless you constraint it with maxdepth. The {} is the name of the file that will get substituted in its place and the + is one of two end-of-command markers, the other being ;. The difference between them is that ; means exec the command on each file one at a time, whereas + means exec the command on all the files at once. Note, however, that your shell will probably try to interpret ; itself, so you will need to escape it with either \; or ';'. Yes, find has a number of little annoyances like this, but its power more than makes up for it.

Both find and xargs are tricky to learn at first. To help you learn xargs try using the -p or --interactive option which will show you the command it is about to execute and prompt you whether or not you want to run it.

Similarly with find you can use -ok in place of -exec to prompt you whether or not you want to run the command.

There are times, though, when find won't be able to do everything you want and that is where xargs comes in. The -exec command will only accept one instance of {} appearing, so if you would get an error with find -type f -exec cp {} {}.bak \; so you could instead do it like so: find -type f -print0 | xargs -0 -l1 -IX cp X X.bak

You can learn more about Run Commands in the GNU Findutils manual.

Also, I mentioned that find safely does what you want because when you are dealing with files you are going to encounter spaces and other characters that will cause problems with xargs unless you use the -0 or --null option along with something that generates input items terminated by a null character instead of whitespace.

Yes, exactly. Why is looping over find's output bad practice? — Wildcard, Nov 16 '16 at 05:33
@Wildcard filenames with spaces or chars such as ' or " can be problematic, whereas find will handle those cases without a problem. — aculich, Dec 04 '16 at 18:33
Yes, I know. See my answer to the linked question. I probably should have rephrased that question to a statement in the above comment, or added the phrase "See the question..." in front of it. :D — Wildcard, Dec 04 '16 at 21:23

Ciro Santilli OurBigBook.com · Answer 6 · 2022-12-28T07:42:14.187

Understand xargs with a minimal example

Before looking into why xargs is useful, let's first make sure that we understand what xargs does with some minimal examples.

When you do either of:

printf '1 2 3 4' | xargs rm
printf '1\n2\n3\n4' | xargs rm

xargs parses the input string coming from stdin, and separates arguments by whitespace, somewhat like Bash, though the details are a bit different. In particular, spaces and newlines are treated differently if you use xargs -L instead of -n: https://stackoverflow.com/questions/6527004/why-does-xargs-l-yield-the-right-format-while-xargs-n-doesnt/6527308#6527308

Because we are not using -L however, both of the above calls are equivalent, and xargs would parse out four arguments: 1, 2, 3 and 4.

Then, xargs takes the arguments it parsed out, and feeds them to the program we are calling with. In our case, it is the executable /usr/bin/rm.

By default, xargs does not specify how many arguments it is going to pass at a time, and unless we pass some flags, and it could be more than one. So the above xargs calls could be equivalent to either:

rm 1 2 3 4

or:

rm 1 2
rm 3 4

or:

rm 1
rm 2
rm 3
rm 4

and we generally don't know which one of the above happened because for rm, the end result would be the same: files 1, 2, 3, and 4 would be removed, so we don't care much about which one xargs is doing anyways, so we just let it do its thing.

It could make a difference for other programs, e.g. /usr/bin/echo however, where a newline is added for every call.

Control how many arguments are passed at a time

We can control how many arguments are passed at once to xargs with certain flags.

The simplest one is -n, which limits the maximum number of arguments to be passed at a time.

Then, we can try to observe what is going on by using /usr/bin/echo instead of /usr/bin/rm, because echo, unlike rm treats echo 1 2 differently than echo 1; echo 2 as it adds a newline for each call.

With this in mind, if we run:

printf '1 2 3 4' | xargs -n2 echo

it supplies 2 arguments at a time to echo and is equivalent to:

echo 1 2
echo 3 4

which produces:

1 2
3 4

And if we instead run:

printf '1 2 3 4' | xargs -n1 echo

it supplies 1 argument at a time to echo and is equivalent to:

echo 1
echo 2
echo 3
echo 4

which produces:

Another way is to use -L instead of -n. -L is like -n but only splits by newlines, not spaces: https://stackoverflow.com/questions/6527004/why-does-xargs-l-yield-the-right-format-while-xargs-n-doesnt/6527308#6527308

And another common way to control the number of arguments is -I which implies -L1, e.g.:

printf '1\n2\n3\n4\n' | xargs -I% echo a % b

is equivalent to:

echo a 1 b
echo a 2 b
echo a 3 b
echo a 4 b

and so produces:

a 1 b
a 2 b
a 3 b
a 4 b

Alternative approaches and why xargs is superior

Now that we understand what xargs does, let's consider the alternatives and why xargs is better.

Suppose we have a file:

notes.txt

Instead of:

xargs < notes.txt | rm

we might want to use:

rm $(cat notes.txt)

which expands to:

rm 1 2 3 4

However, this is problematic because there is a maximum size for the command line arguments of a Linux program so it could fail if there were too many arguments in notes.txt.

xargs knows about this, and automatically splits arguments intelligently to avoid having too many at a time.

And there is no maximum size to streams like stdin, so things can work to arbitrary sizes like this. The reason why it works is that streams can be read little by little with the read() system call while CLI arguments must be loaded all at once into virtual memory, so there is no need for a hard maximum on stream sizes.

Another simple approach you could try would be:

while IFS="" read -r p || [ -n "$p" ]
do
  rm "$p"
done < notes.txt

from: https://stackoverflow.com/questions/1521462/looping-through-the-content-of-a-file-in-bash but this requires a lot of typing, and could be slower because:

it calls the /usr/bin/rm executable once for every argument, rather than fewer times with a bunch of arguments
more time is spent on the bash while loop, as opposed to the C-coded xargs code

To make xargs even more interesting, the GNU version that a -P option for parallel operation!

Related: https://superuser.com/questions/600253/why-is-xargs-necessary

score 1 · Answer 7 · answered Nov 19 '11 at 23:05

1

xargs (along with find, sort, du, uniq, perl and a few others) accepts a command-line switch to say "STDIN has a list of files, separated by a NUL (0x00) byte". This makes it easy to handle filenames with spaces and other funny characters in them. Filenames don't contain NULs.

answered Nov 19 '11 at 23:05

waltinator

4,865

2

I think you mean "filenames can't contain nulls." – amphetamachine Nov 23 '11 at 05:50

When is xargs needed?

7 Answers7

Linked