37

I have a bash script that looks like the following:

##script
#!/bin/bash
rm data*
rm logfile*
for i in {1..30}
do
## append a & if you want to run it parallel;
nohup Rscript --vanilla main.R 10 100 $i &> logfile"$i" &
done

I would like to create another for loop after the first one to continue for another 30. For example

##script
#!/bin/bash
rm data*
rm logfile*
for i in {1..30}
do
## append a & if you want to run it parallel;
nohup Rscript --vanilla main.R 10 100 $i &> logfile"$i" &

for i in {31..60}
do
## append a & if you want to run it parallel;
nohup Rscript --vanilla main.R 10 100 $i &> logfile"$i" &
done

I would like for the first set of jobs to finish before starting the new set. But because of the nohup it seems that they are all run simultaneously.

I have nohup because I remotely login to my server and start the jobs there and then close my bash. Is there an alternative solution?

masfenix
  • 705

3 Answers3

46

You'll want to use the wait command to do this for you. You can either capture all of the children process IDs and wait for them specifically, or if they are the only background processes your script is creating, you can just call wait without an argument. For example:

#!/bin/bash
# run two processes in the background and wait for them to finish

nohup sleep 3 &
nohup sleep 10 &

echo "This will wait until both are done"
date
wait
date
echo "Done"
ParanoidGeek
  • 756
  • 4
  • 5
14

A few points:

  • If your goal with nohup is to prevent a remote shell exit from killing your worker processes, you should use nohup on the script itself, not on the individual worker processes it creates.

  • As explained here, nohup only prevents processes from receiving SIGHUP and from interacting with the terminal, but it does not break the relationship between the shell and its child processes.

  • Because of the point above, with or without nohup, a simple wait between the two for loops will cause the second for to be executed only after all child processes started by the first for have exited.

  • With a simple wait:

    all currently active child processes are waited for, and the return status is zero.

  • If you need to run the second for only if there were no errors in the first, then you'll need to save each worker PID with $!, and pass them all to wait:

    pids=
    for ...
        worker ... &
        pids+=" $!"
    done
    wait $pids || { echo "there were errors" >&2; exit 1; }
    
  • There could be other jobs running on the server. So I'd only want to wait for my batch.. they are R scripts so they are run under R or cc1plus in the top command – masfenix Aug 22 '16 at 22:32
  • Also i'd like to use nohup inside to run all the commands in "parallel". basically these are simulations for a scientific program. I want to run 180 simulations in total, but in batches of 60. The counter also needs to go from 1 to 180.

    If i do them one at a time, it will take too long.

    – masfenix Aug 23 '16 at 04:29
  • wait causes bash to wait for the background jobs it spawned itself, nothing else. There might be some confusion here- these for loops, did you save them to a file and invoke them as a script (what I assumed, because of the ##script line), or are you typing them by hand in the terminal? – Matei David Aug 23 '16 at 13:22
  • i was doing cat file.txt | while and the pids was not set outside the loop so the wait command saw an empty $pids string. why this happens is discussed at https://serverfault.com/q/259339. easily fixed as while ... < files.txt as answered at https://serverfault.com/a/259346 – simbo1905 Sep 10 '20 at 07:53
  • Just curious what is the purpose of + sign with pids variable ? – OmiPenguin Jan 12 '21 at 08:04
0

If you insert something like the following code segment in between your two for loops, it might help.

flag=0

while [ flag -eq 0 ]
do
  ps -ef | grep "Rscript --vanilla" | grep -v grep > /dev/null
  flag=${?}
  sleep 10
done

Of course, if your application Rscript has a chance of not completing successfully and lingering around, your second for loop may not have a chance to run. Code segment above assumes, all processes with the identifier Rscript --vanilla will complete and disappear properly. Without knowing what your application does and how it runs, I have to rely on this assumption.

EDIT

In the light of the comments, this would better suit your needs. (it includes your original code as well as completion checking logic)

for i in {1..30}
do
## append a & if you want to run it parallel;
nohup Rscript --vanilla main.R 10 100 $i &> logfile"$i" &
pids[$i]=${!}
done

flag=0

while [ flag -eq 0 ] 
do
  for PID in $(echo ${pids[@]})
  do
    flag=1
    ps -ef | grep ${PID} | grep -v grep >/dev/null; r=${?}
    if [ ${r} -eq 0 ]
    then 
      flag=0
    fi
  done
done

for i in {31..60}
do
## append a & if you want to run it parallel;
nohup Rscript --vanilla main.R 10 100 $i &> logfile"$i" &
done
MelBurslan
  • 6,966
  • The process name in top shows either R sometimes or cc1plus. – masfenix Aug 22 '16 at 15:09
  • In that case you will need to find a common denominator, showing up in the ps -ef listing. Or after each nohup command, record the PID to a variable (preferably an array) by echo ${!} and check for this group of PIDs. When they all disappear, you can proceed to the second for loop – MelBurslan Aug 22 '16 at 15:12