267

I have been trying to parallelize the following script, specifically each of the three FOR loop instances, using GNU Parallel but haven't been able to. The 4 commands contained within the FOR loop run in series, each loop taking around 10 minutes.

#!/bin/bash

kar='KAR5'
runList='run2 run3 run4'
mkdir normFunc
for run in $runList
do 
  fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
  fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
  fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
  fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear

  rm -f *.mat
done
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

10 Answers10

394

Sample task

task(){
   sleep 0.5; echo "$1";
}

Sequential runs

for thing in a b c d e f g; do 
   task "$thing"
done

Parallel runs

for thing in a b c d e f g; do 
  task "$thing" &
done

Parallel runs in N-process batches

N=4
(
for thing in a b c d e f g; do 
   ((i=i%N)); ((i++==0)) && wait
   task "$thing" & 
done
)

It's also possible to use FIFOs as semaphores and use them to ensure that new processes are spawned as soon as possible and that no more than N processes runs at the same time. But it requires more code.

N processes with a FIFO-based semaphore:

# initialize a semaphore with a given number of tokens
open_sem(){
    mkfifo pipe-$$
    exec 3<>pipe-$$
    rm pipe-$$
    local i=$1
    for((;i>0;i--)); do
        printf %s 000 >&3
    done
}

# run the given command asynchronously and pop/push tokens
run_with_lock(){
    local x
    # this read waits until there is something to read
    read -u 3 -n 3 x && ((0==x)) || exit $x
    (
     ( "$@"; )
    # push the return code of the command to the semaphore
    printf '%.3d' $? >&3
    )&
}

N=4
open_sem $N
for thing in {a..g}; do
    run_with_lock task $thing
done 

Explanation:

We use file descriptor 3 as a semaphore by pushing (=printf) and poping (=read) tokens ('000'). By pushing the return code of the executed tasks, we can abort if something went wrong.

Seriously
  • 103
Petr Skocik
  • 28,816
  • 8
    The line with wait in it basically lets all processes run, until it hits the nth process, then waits for all of the others to finish running, is that right? – naught101 Nov 26 '15 at 23:03
  • If i is zero, call wait. Increment i after the zero test. – Petr Skocik Nov 26 '15 at 23:08
  • 3
    @naught101 Yes. wait w/ no arg waits for all children. That makes it a little wasteful. The pipe-based-semaphore approach gives you more fluent concurrency (I've been using that in a custom shell based build system along with -nt/-ot checks successfully for a while now) – Petr Skocik Mar 10 '18 at 20:02
  • what does "$1" mean here? – Ka Wa Yip Apr 08 '18 at 00:01
  • @kyle 1st argument – Petr Skocik Apr 10 '18 at 16:56
  • What happens in the semaphore version when the task function has a non-zero return code? Does it break the ability of it to launch a new child when that semaphore is read back out of the pipe and cause the entire script to exit? – BeowulfNode42 Dec 17 '18 at 05:51
  • 1
    @BeowulfNode42 You don't have to exit. The task's return status won't harm the consistency of the semaphore as long the status (or something with that bytelength) is written back to the fifo after the task's process exits/crashes. – Petr Skocik Dec 17 '18 at 10:27
  • 1
    FYI the mkfifo pipe-$$ command needs appropriate write access to the current directory. So I prefer to specify the full path such as /tmp/pipe-$$ as it most likely has write access available for the current user rather than relying on whatever the current directory is. Yes replace all 3 occurrences of pipe-$$. – BeowulfNode42 Jul 29 '19 at 03:13
  • 3
    Note, if you have set -e the fourth solution wouldn't work for you, you'd need to change it to ((++i==1)) – lol Aug 20 '20 at 12:14
  • It seems that (when using N=3) in both of the last two examples the prompt returns before the last task (for "g") is completed. In the next to last example, an extra wait right after the loop solves this, but for the last example the solution doesn't seem to be so trivial (so I don't know it :-) ). – n1ghtm4n4g3r Sep 07 '20 at 14:08
  • I needed variable expansion: (((++i % $NR_PROCESSES) == 0)); – foudfou Nov 13 '20 at 09:10
  • 3
    For "N processes with a FIFO-based semaphore" remember to add a "wait" after the for/done loop to prevent the script going further while the last task is being executed – Isaías Dec 13 '20 at 23:21
  • 5
    An even more elegant solution is suggested at https://unix.stackexchange.com/a/436713/192211 which is to use wait -n to get away with pure bash without batching. – EFraim Aug 06 '21 at 20:45
  • doesn't seem to work for my use-case https://stackoverflow.com/questions/70191996/how-do-i-run-loops-simultaneously-in-gitlab-ci – uberrebu Dec 08 '21 at 21:46
  • what may be the reasons if parallelization with & crash X11? – kaiya Feb 01 '22 at 14:03
  • Yes yes yes to all this, but I think the FIFO can be done simpler: https://unix.stackexchange.com/questions/618042/fifo-based-semaphore-explanation/692744#692744 – Pavel Komarov Mar 03 '22 at 00:11
  • this one really blow my mind ((i=i%N)); ((i++==0)) && wait : why is it waiting background jobs ? – Stéphane Mar 10 '22 at 15:25
  • @Stéphane Yeah, I wouldn't recommend the waiting version. Use the semaphore version instead. For best performance you want N processes running at a time where N is your number of CPU cores. The waiting version can result in some cores being temporarily underutilized, unlike the semaphore version, which is more finegrained. – Petr Skocik Mar 10 '22 at 22:56
  • Does "Parallel runs" need wait at the end? – pmor Nov 01 '22 at 14:11
  • The bash -e exiting problem hit me for ((i=i%N)); ((i++==0)) && wait I found this works instead ((i=(i+1)%N)) || wait. It seems ((...)) does a logical-not of the numeric value result (which means it returns bash "true" or "success" if the value is not zero).

    Also this works instead to keep N tasks running in parallel from the answer below; [[ $(jobs -r -p | wc -l) -lt $N ]] || wait -n

    – Donovan Baarda Nov 12 '22 at 07:25
165

Why don't you just fork (aka. background) them?

foo () {
    local run=$1
    fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}

for run in $runList; do foo "$run" & done

In case that's not clear, the significant part is here:

for run in $runList; do foo "$run" & done
                                   ^

Causing the function to be executed in a forked shell in the background. That's parallel.

jordanm
  • 42,678
goldilocks
  • 87,661
  • 30
  • 204
  • 262
  • 8
    That worked like a charm. Thank you. Such a simple implementation (Makes me feel so stupid now!). – Ravnoor S Gill Dec 05 '13 at 21:24
  • 11
    In case I had 8 files to run in parallel but only 4 cores, could that be integrated in such a setting or would that require a Job Scheduler? – Ravnoor S Gill Dec 05 '13 at 21:27
  • 6
    It doesn't really matter in this context; it's normal for the system to have more active processes than cores. If you have many short tasks, ideally you would feed a queue serviced by a number or worker threads < the number of cores. I don't know how often that is really done with shell scripting (in which case, they wouldn't be threads, they'd be independent processes) but with relatively few long tasks it would be pointless. The OS scheduler will take care of them. – goldilocks Dec 05 '13 at 21:50
  • 47
    You also might want to add a wait command at the end so the master script does not exit until all of the background jobs do. – psusi Nov 19 '15 at 00:22
  • 2
    I would also fine it useful to limit the number of concurrent processes: my processes each use 100% of a core's time for about 25 minutes. This is on a shared server with 16 cores, where many people are running jobs. I need to run 23 copies of the script. If I run them all concurrently, then I swamp the server, and make it useless for everyone else for an hour or two (load goes up to 30, everything else slows way down). I guess it could be done with nice, but then I don't know if it'd ever finish.. – naught101 Nov 26 '15 at 23:00
  • 2
    Ahh, PSkocik's answer below has a really simple solution. – naught101 Nov 26 '15 at 23:07
  • 1
    What does local run=$1 mean here? – Ka Wa Yip Apr 08 '18 at 00:03
  • 1
    @kyle The first argument to the foo(), i.e., $run. https://www.gnu.org/software/bash/manual/html_node/Positional-Parameters.html#Positional-Parameters – goldilocks Apr 08 '18 at 12:03
103
for stuff in things
do
( something
  with
  stuff ) &
done
wait # for all the something with stuff

Whether it actually works depends on your commands; I'm not familiar with them. The rm *.mat looks a bit prone to conflicts if it runs in parallel...

frostschutz
  • 48,978
  • 2
    This runs perfectly as well. You are right I would have to change rm *.mat to something like rm $run".mat" to get it to work without one process interfering with the other. Thank you. – Ravnoor S Gill Dec 05 '13 at 21:38
  • 19
    +1 for wait, which I forgot. – goldilocks Dec 06 '13 at 12:13
  • 10
    If there are tons of 'things', won't this start tons of processes? It would be better to start only a sane number of processes simultaneously, right? – David Doria Mar 20 '15 at 15:17
  • @DavidDoria sure, this is meant for small scale. (The example in the question had only three items). I use this style for unlocking a dozen LUKS containers on bootup... if I had a lot more, I'd have to use some other method, but on a small scale this is simple enough. – frostschutz Mar 20 '15 at 16:41
  • @goldilocks Yes, wait is like a "thread join". – Geremia May 21 '20 at 00:39
  • @Geremia I'd say a thread join is like a wait ;) – goldilocks May 21 '20 at 12:35
  • Can anyone explain to me why there need a wait after done ? – Tack_Tau Mar 18 '23 at 12:58
  • @Tack_Tau if you don't care when processes finish (if at all), you might not need to wait for them. but then it's easy to lose track of background processes. using wait makes it more obvious that there are still tasks running. – frostschutz Mar 18 '23 at 13:15
65

Parallel execution in max N-process concurrent

Just a vanilla bash script - no external libs/apps needed.

#!/bin/bash

N=4

for i in {a..z}; do ( # .. do your stuff here echo "starting task $i.." sleep $(( (RANDOM % 3) + 1)) ) &

# allow to execute up to $N jobs in parallel
if [[ $(jobs -r -p | wc -l) -ge $N ]]; then
    # now there are $N jobs already running, so wait here for any job
    # to be finished so there is a place to start next one.
    wait -n
fi

done

no more jobs to be started but wait for pending jobs

(all need to be finished)

wait

echo "all done"

Another example of processing a list of files in parallel:

#!/bin/bash

N=4

find ./my_pictures/ -name "*.jpg" | ( while read filepath; do jpegoptim "${filepath}" & if [[ $(jobs -r -p | wc -l) -ge $N ]]; then wait -n; fi done; wait )

  • That worked like a charm for me, but I am still not sure how. Usually other answers will involve some sort of semaphore. How is jobs achieving the same result? Or is it slightly different? – Reuel Ribeiro Feb 28 '24 at 13:27
  • 1
    The solution is kind of similar - the semaphore in this case is checking the number of jobs jobs -r -p | wc -l together with wait -n which stops execution until one of the jobs is done. – Tomasz Hławiczka Feb 29 '24 at 14:27
45
for stuff in things
do
sem -j+0 "something; \
  with; \
  stuff"
done
sem --wait

This will use semaphores, parallelizing as many iterations as the number of available cores (-j +0 means you will parallelize N+0 jobs, where N is the number of available cores).

sem --wait tells to wait until all the iterations in the for loop have terminated execution before executing the successive lines of code.

Note: you will need "parallel" from the GNU parallel project (sudo apt-get install parallel).

lev
  • 589
37

One really easy way that I often use:

cat "args" | xargs -P $NUM_PARALLEL command

This will run the command, passing in each line of the "args" file, in parallel, running at most $NUM_PARALLEL at the same time.

You can also look into the -I option for xargs, if you need to substitute the input arguments in different places.

  • 2
    This can work well, if you have e.g. a list of file names to process. But it really isn't a for loop if you are pedantic. Nevertheless, I find the solution elegant. – AdamKalisz Apr 17 '20 at 07:42
  • I used this in a while loop I had, for deleting many things via gcloud commands, and it was perfect :) – djsmiley2kStaysInside Dec 22 '21 at 16:52
10

It seems the fsl jobs are depending on eachother, so the 4 jobs cannot be run in parallel. The runs, however, can be run in parallel.

Make a bash function running a single run and run that function in parallel:

#!/bin/bash

myfunc() {
    run=$1
    kar='KAR5'
    mkdir normFunc
    fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}

export -f myfunc
parallel myfunc ::: run2 run3 run4

To learn more watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 and spend an hour walking through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html Your command line will love you for it.

Ole Tange
  • 35,514
9

I really like the answer from @lev as it provides control over the maximum number of processes in a very simple manner. However as described in the manual, sem does not work with brackets.

for stuff in things
do
sem -j +0 "something; \
  with; \
  stuff"
done
sem --wait

Does the job.

-j +N Add N to the number of CPU cores. Run up to this many jobs in parallel. For compute intensive jobs -j +0 is useful as it will run number-of-cpu-cores jobs simultaneously.

-j -N Subtract N from the number of CPU cores. Run up to this many jobs in parallel. If the evaluated number is less than 1 then 1 will be used. See also --use-cpus-instead-of-cores.

moritzschaefer
  • 201
  • 2
  • 6
3

I had trouble with @PSkocik's solution. My system does not have GNU Parallel available as a package and sem threw an exception when I built and ran it manually. I then tried the FIFO semaphore example as well which also threw some other errors regarding communication.

@eyeApps suggested xargs but I didn't know how to make it work with my complex use case (examples would be welcome).

Here is my solution for parallel jobs which process up to N jobs at a time as configured by _jobs_set_max_parallel:

_lib_jobs.sh:

function _jobs_get_count_e {
   jobs -r | wc -l | tr -d " "
}

function _jobs_set_max_parallel {
   g_jobs_max_jobs=$1
}

function _jobs_get_max_parallel_e {
   [[ $g_jobs_max_jobs ]] && {
      echo $g_jobs_max_jobs

      echo 0
   }

   echo 1
}

function _jobs_is_parallel_available_r() {
   (( $(_jobs_get_count_e) < $g_jobs_max_jobs )) &&
      return 0

   return 1
}

function _jobs_wait_parallel() {
   # Sleep between available jobs
   while true; do
      _jobs_is_parallel_available_r &&
         break

      sleep 0.1s
   done
}

function _jobs_wait() {
   wait
}

Example usage:

#!/bin/bash

source "_lib_jobs.sh"

_jobs_set_max_parallel 3

# Run 10 jobs in parallel with varying amounts of work
for a in {1..10}; do
   _jobs_wait_parallel

   # Sleep between 1-2 seconds to simulate busy work
   sleep_delay=$(echo "scale=1; $(shuf -i 10-20 -n 1)/10" | bc -l)

   ( ### ASYNC
   echo $a
   sleep ${sleep_delay}s
   ) &
done

# Visualize jobs
while true; do
   n_jobs=$(_jobs_get_count_e)

   [[ $n_jobs = 0 ]] &&
      break

   sleep 0.1s
done
Zhro
  • 2,669
  • Really nice, works like a charm. – StefanKssmr Feb 15 '21 at 07:47
  • Tell us which system you have where 'parallel' wasn't an option. I'd also mention /usr/bin/ts (aka task spooler) as an option, but since ts the time-stamper (in moreutils) usually claims the /usr/bin spot, look for it as tsp or something similar. It builds a similar spooler with N parallel processes, and lets you dump more things at it. https://www.linux.com/news/queuing-tasks-batch-execution-task-spooler/ – user2066657 Dec 17 '21 at 22:49
  • @user2066657 My answer here is four years old and I can't remember the original use case. It would have either been on CentOS 6, 7 or Cygwin as that is what I would have been using at the time. – Zhro Dec 18 '21 at 01:54
  • Yeah, I saw the date. Some people remember. The good news is that moreutils is available for both centos flavours. It must've been cygwin. Anyway, keep the ts in the back of your mind for next time. – user2066657 Dec 18 '21 at 07:36
3

In my case, I can't use semaphore (I'm in git-bash on Windows), so I came up with a generic way to split the task among N workers, before they begin.

It works well if the tasks take roughly the same amount of time. The disadvantage is that, if one of the workers takes a long time to do its part of the job, the others that already finished won't help.

Splitting the job among N workers (1 per core)

# array of assets, assuming at least 1 item exists
listAssets=( {a..z} ) # example: a b c d .. z
# listAssets=( ~/"path with spaces/"*.txt ) # could be file paths

# replace with your task
task() { # $1 = idWorker, $2 = asset
  echo "Worker $1: Asset '$2' START!"
  # simulating a task that randomly takes 3-6 seconds
  sleep $(( ($RANDOM % 4) + 3 ))
  echo "    Worker $1: Asset '$2' OK!"
}

nVirtualCores=$(nproc --all)
nWorkers=$(( $nVirtualCores * 1 )) # I want 1 process per core

worker() { # $1 = idWorker
  echo "Worker $1 GO!"
  idAsset=0
  for asset in "${listAssets[@]}"; do
    # split assets among workers (using modulo); each worker will go through
    # the list and select the asset only if it belongs to that worker
    (( idAsset % nWorkers == $1 )) && task $1 "$asset"
    (( idAsset++ ))
  done
  echo "    Worker $1 ALL DONE!"
}

for (( idWorker=0; idWorker<nWorkers; idWorker++ )); do
  # start workers in parallel, use 1 process for each
  worker $idWorker &
done
wait # until all workers are done
geekley
  • 155
  • 1
    One drawback of this solution is that it may happen that some workers finish while the others still have jobs. – SkateScout Nov 03 '21 at 23:31
  • 1
    It's not a 'may happen' thing: it's a certainty. The wait will always wait for the slowest process. And that's completely okay as long as we can wait for that slowest job to finish. – user2066657 Dec 17 '21 at 22:44