3

I run this parallel command:

parallel -q -j0 ssh {} 'echo {}; tmp/myscript' ::: host1 host2 host3 ....

Above line shows a result for most hosts.

Unfortunately the parallel command hangs on some host. The script seems to be in an endless loop.

How can I detect on which host it hangs?

All hosts are reachable. I tested this with this command:

parallel -q -j0 ssh {} 'echo {}; date' ::: host1 host2 host3 ....
guettli
  • 1,389

3 Answers3

3

I would use --timeout 1000%: If one job takes 10 times longer that the typical runtime, kill it.

Then I would use --joblog mylog to see which job timed out (exit val = -1).

You could also use --nonall (instead of -q ssh) and --tag to see which jobs completed (and thus deducing which one is stuck).

Ole Tange
  • 35,514
  • I use parallel since several month, but --timeout 1000% was new to me. Thank you! – guettli Mar 08 '18 at 14:32
  • I am unsure if --nonall does help me. In my current use case I want to run the job only once on every given host. I don't want it to run twice on a host. But maybe I did not understood the man page. – guettli Mar 08 '18 at 14:43
  • --nonall runs a single command on a list of hosts, which are given by -S or --slf. – Ole Tange Mar 08 '18 at 15:25
2

You can tell with the ps command.

Run your script again, wait until it hangs, then run:

ps -elf | grep ssh

You should see the ssh process for the node that this has "hung" on.

alpha
  • 1,994
0

I found a solution without parallel.

I run it se

for host in host1 host2 ...; do echo $host; ssh $host tmp/myscript; echo; done

This way I see where it hangs.

guettli
  • 1,389