2

I wrote a script that runs on multiple servers. Sometimes the script gets hung on one of the servers and I have to hit (control C) to end the process. If not, it gets stuck and keeps trying to connect.

If/when a server gets hung or unresponsive when running a script, is there a way to skip that host so the script can go to the next host and keep running along? Usually When I hit control C, that ends the entire process.

Here's an example of the script. Let's say it gets hung on machine 3.

HOSTS=(MACHINE1 MACHINE2 MACHINE3 MACHINE4 MACHINE5)
for i in "${HOSTS[@]}"
do
  echo "$i"
  ssh -q "$i" uname -a
 done

This script is being run on OS X. I tried using the timeout command but unfortunately, it does not work.

2 Answers2

2

Rather than roll your own and have to cope with everything that can go wrong (host not responding, host stopping responding in the middle, user pressing Ctrl+C, error reporting, …), use one of the many existing tools to run a command on many machines over SSH.

mussh -t 4 -H <(printf '%s\n' "${HOSTS[@]}") -c 'uname -a'
pssh -t 4 -h <(printf '%s\n' "${HOSTS[@]}") uname -a
pdsh -u 4 -w "$(printf %s, "${HOSTS[@]}")" 'uname -a'
…
1

A typical way to do this is to use the trap command to tell the shell script to ignore SIGINT (generated by Control-C), and then to re-enable SIGINT in a subshell just before your command is run.

trap "" INT
HOSTS=(MACHINE1 MACHINE2 MACHINE3 MACHINE4 MACHINE5)
for i in "${HOSTS[@]}"
do
    echo "$i"
    (trap - INT; ssh -q "$i" "uname -a")
done
Mark Plotnick
  • 25,413
  • 3
  • 64
  • 82