50

I'm confused about the difference or advantage (if any) of running a set of tasks in a .sh script using GNU parallel

E.g. Ole Tange's answer:

parallel ./pngout -s0 {} R{} ::: *.png

rather than say looping through them putting them in the background &.

E.g. frostschutz's answer:

#copied from the link for illustration
for stuff in things
do
( something
  with
  stuff ) &
done
wait # for all the something with stuff

In short are they just syntactically or practically different? And if practically different when should I use each?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

1 Answers1

60

Putting multiple jobs in the background is a good way of using the multiple cores of a single machine. parallel however, allows you to spread jobs across multiple servers of your network. From man parallel:

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables.

Even when running on a single computer, parallel gives you far greater control over how your jobs are parallelized. Take this example from the manpage:

   To convert *.wav to *.mp3 using LAME running one process per CPU core
   run:

   parallel lame {} -o {.}.mp3 ::: *.wav

OK, you could do the same with

   for i in *wav; do lame "$i" -o "${i%.wav}.mp3" & done

However, that is longer and more cumbersome and, more importantly, will launch as many jobs as there are .wav files. If you run this on a few thousand files, it is likely to bring a normal laptop to its knees. parallel on the other hand, will launch one job per CPU core and keep everything nice and tidy.

Basically, parallel offers you the ability to fine tune how your jobs are run and how much of available resources they should use. If you really want to see the power of this tool, go through its manual or, at the very least, the examples it offers.

Simple backgrounding really has nowhere near the level of sophistication to be compared to parallel. As for how parallel differs from xargs, the GNU crowd give a nice breakdown here. Some of the more salient points are:

  • xargs deals badly with special characters (such as space, ' and ").
  • xargs can run a given number of jobs in parallel, but has no support for running number-of-cpu-cores jobs in parallel.
  • xargs has no support for grouping the output, therefore output may run together, e.g. the first half of a line is from one process and the last half of the line is from another process.
  • xargs has no support for keeping the order of the output, therefore if running jobs in parallel using xargs the output of the second job cannot be postponed till the first job is done.
  • xargs has no support for running jobs on remote computers.
  • xargs has no support for context replace, so you will have to create the arguments.
Ole Tange
  • 35,514
terdon
  • 242,166
  • 1
    That's a good answer, thx. It sort of confirms what I guessed. I hate the parallel syntax, yet another new brand of keyboard-faceroll to memorise. But I guess the auto balancing across cores/jobs is worth it...? – Stephen Henderson Dec 12 '13 at 08:02
  • 4
    Have a look at sem which is part of the GNU Parallel package. That might suit your syntax requirements better. – Ole Tange Dec 12 '13 at 10:53
  • 1
    @OleTange thx, good call – Stephen Henderson Dec 12 '13 at 11:37
  • 1

    xargs has no support for context replace, so you will have to create the arguments. --- What does this mean? Isn't it xargs -I %

    – raine Feb 18 '16 at 11:00
  • 9
    It's true that parallel is more powerful than xargs, but that comparison is rather biased. For example, xargs supports null-terminated strings as input to avoid problems with spaces and quotes, and can also -d to emulate parallel (even mentioned in the comparison!). xargs -I is sufficient context replacement for most simple cases, and I usually know the number of the cores on the machine. I never experienced a problem with ungrouped output. – Sam Brightman Aug 26 '16 at 10:10
  • 1
    What is ::: ? – mrgloom Nov 19 '19 at 11:07
  • @mrgloom it's the syntax understood by parallel. See man parallel. Also see sem, as Ole Tange mentioned above, that's a simpler syntax. – terdon Nov 19 '19 at 11:16
  • sem, an alias for parallel --semaphore, is the best thing I've learned today! – jena Nov 10 '23 at 15:28