16

I can ssh into a remote machine that has 64 cores. Lets say I need to run 640 shell scripts in parallel on this machine. How do I do this?

I can see splitting the 640 scripts into 64 groups each of 10 scripts. How would I then run each of these groups in parallel, i.e. one group on each of one of the available cores.

Would a script of the form

    ./script_A &
    ./script_B &
    ./script_C &
    ...

where script_A corresponds to the first group, script_B to the second group etc., suffice?

The scripts within one group that run on one core are ok to run sequentially, but I want the groups to run in parallel across all cores.

Tom
  • 263
  • It is not guaranteed they are distributed evenly by the cores. Have a look at this thread. http://stackoverflow.com/questions/13583146/whole-one-core-dedicated-to-single-process – Rui F Ribeiro Dec 09 '15 at 13:45

5 Answers5

25

This looks like a job for gnu parallel:

parallel bash -c ::: script_*

The advantage is that you don't have to group your scripts by cores, parallel will do that for you.

Of course, if you don't want to babysit the SSH session while the scripts are running, you should use nohup or screen

  • It is a good answer and I accept it as in the general case this would work well.Unfortunately for me personally I don't have administrator privileges to the remote machine and so can't install the parallel package. Thanks` – Tom Dec 09 '15 at 18:16
  • 11
    You do not have to install parallel globally: you should be able to run a copy from your own home directory. – dhag Dec 09 '15 at 19:23
  • bash -c may be unneeded: parallel ::: ./script*. With 640 script it is likely they are very similar (eg. only an argument is different). For that consider using GNU Parallel directly to set these arguments and use a single script. – Ole Tange Dec 10 '15 at 07:45
  • How would I install gnu parallel on a remote machine? – Tom Dec 30 '15 at 11:51
  • @Tom What is changed by the fact that you're using a remote machine? Just get the right package from http://www.gnu.org/software/parallel/ and install. – Dmitry Grigoryev Dec 30 '15 at 16:31
5

That will work so long as you don't need to monitor the output and you're okay leaving your ssh session open for as long as the scripts take to run. If either of those aren't true I would recommend using screen with multiple tabs. You could do something like

screen
for script in script_A script_B script_C; do
  screen -t "$script" ./$script
done;
David King
  • 3,147
  • 9
  • 23
  • Monitoring the outputs I am not concerned with - I wouldn't want to leave the ssh session open. What about using nohup? This would prevent the scripts from stopping if the session is ended no? I will also have a look at your 'screen recommendation. Thanks!' – Tom Dec 09 '15 at 13:55
  • nohup would probably work, I'm just more familiar with screen and it has a lot more functionality which may or may not be useful to you. – David King Dec 09 '15 at 13:57
2

To kick off and manage large number of scripting jobs, you will need some sort of management software to control resource usage (CPU, memory, priority), see the job status (wait, suspend, running, finished).

Grid engine is built for that, for example, Sun Grid Engine (http://wiki.gridengine.info/wiki/index.php/Main_Page) or Open Grid Scheduler (http://gridscheduler.sourceforge.net/). You do need the administrator to install the proper software for you before you can start. The administrator might be happy to do that, instead of seeing hundreds of processes running on the machine, and have no control over them.

In general, admin defines how many slots a machine can be divided into, and you submit a job to a queue and specify how many slots the job wants to consume, the grid engine will monitor the overall system usage, and run the job according to the queuing policy defined by admin. e.g. no more than x jobs can run at the same time, etc. the rest of the jobs will be in queue in waiting state, and released after earlier jobs finish.

0

You may try the distributed shell. Download from: http://sourceforge.net/projects/dsh/

0

I've done this on a number of occasions and usually just roll my own script to do the job with job control. Generically if you have the names of all of the scripts you want to run in a file the solution looks like:

#!/bin/bash
scripts=$(cat scriptfiles.txt)
declare -i NUM=0
declare -i MAX_PROCS=30
for script in "$scripts"
do
  NUM=$((NUM+1))
  ssh remote.host.ip "${script}" > ${script}.log 2>&1 &
  if [ $NUM -ge $MAX_PROCS ];then
    echo "Waiting for $NUM processes to finish."
    wait
    NUM=0
  fi
done
echo "Waiting for final $NUM processes to finish."
wait
exit

It's brute force, but effective. Plus you don't need any extra software like parallel be added to your systems.

A big problem is that the wait command will wait for the slowest script to finish, which can waste time. I've created scripts to take care of this situation, but they get more complex as you can imagine. If all of your scripts run in about the same amount of time, this works well.

Another problem is you may have to tune MAX_PROCS to determine the best performance.

Of course, the number of ssh connections can get unwieldy. In which case just move this script to the remote host and change the "ssh..." line to just run the scripts directly.

OldTimer
  • 3,677
  • 1
  • 16
  • 12