2

I wanted to ssh to multiple servers remotely and check whether any processes running on those servers and wait until the process to get finished. I have written the below code but this checks only for the first ip in the file(ip.txt) since I added 'continue' statement. I need to modify this code.

  while read IP
  do
    ssh ubuntu@$IP "pgrep -f pattern"
    if [ $? -eq 0 ]; then
       echo "Process is running" 
       sleep 10
       continue
    else
       echo "Process is not running"
    fi
  done < ip.txt
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
Nani
  • 353
  • you'll need to stop ssh from eating standard input with -n or by closing stdin – thrig Feb 22 '18 at 17:56
  • Actually I tried with -n option. The problem is not with ssh. It is running in loop to check whether process is running on first ip only. Here is the output:
    12684 13445 Process is running for XXXXXX 12684 13445 Process is running for XXXXXX 12684 13445 Process is running for XXXXXX 12684 13445
    – Nani Feb 22 '18 at 18:00

3 Answers3

0

It might be better to get your list of IPs sorted first, and then iterate over it:

mapfile ipaddresses < ip.txt
canary=alive
while [[ "alive" == "$canary" ]]; do
  canary=dead
  for ip in ${ipaddresses[@]}; do
    if ssh ubuntu@$ip "pgrep -f pattern"; then
      echo "Process is running on $ip"
      canary=alive
      sleep 10
      continue
    else
      echo "Process not running on $ip"
    fi
  done
done

If you are still stuck on a version of bash below 4, replace the mapfile command with:

read -r ipaddresses <<< "$( cat ip.txt )"
DopeGhoti
  • 76,081
  • I have three server ip's in ip.txt. It printed the process running on three different ip's and exited. 12684 13445 23203 23334 Process is running on X.X.X.44 18048 18335 Process is running on X.X.X.55 29101 29972 Process is running on X.X.X.101 – Nani Feb 22 '18 at 19:04
  • I've fixed the error; it will now run until it does not see any servers with your process running. – DopeGhoti Feb 22 '18 at 19:17
  • Does it checks parallely for all the three servers? does it sleeps for 10 seconds for one server at a time? I wanted to have the first option. – Nani Feb 22 '18 at 19:39
  • It does what your original does - checks each host in series; for any which do not have the process running it just goes to the next host. If a host has the process running, it resets the canary, pauses for ten seconds, and aborts out of the loop. – DopeGhoti Feb 22 '18 at 20:59
0

I see some issues with the basic program flow here. Where you have one construct here, there are really a couple of things being tracked:

  1. If any hosts in your list still have a running process on this check, we should record that and check again
  2. If the number of hosts that still have a process is 0, we should exit.

So I would use two loops for this script. Also, continue shouldn't be needed here, and that's probably what is causing it to repeat the first host each time.

I did a few things differently, mainly testing on the number of PIDs returned from psget instead of checking the ssh exit code, and using some slightly different syntax. Here's the example I came up with that seems to work for what you want:

#!/bin/bash
set -eu

# get number of IPs from lines in file
NUM_IPS=$( cat ip.txt | wc -l )
echo "checking ${NUM_IPS} hosts..."

# Set number of running hosts to the max.  While arbitrary, it will 
# update to the correct number before reporting, and if it is 0 the
# while loop will exit immediately.
IPS_STILL_RUNNING=${NUM_IPS}

while [ "${IPS_STILL_RUNNING}" -gt "0" ]
do
    RUNNING_NOW=0
    for IP in $( cat ip.txt )
    do
        PROC_NUM=$( ssh ${IP} -- pgrep -f pattern | wc -l )
        if [ "${PROC_NUM}" -gt "0" ]; then
            echo "  ${IP}: still running"
            RUNNING_NOW=$(( RUNNING_NOW + 1 ))
        else
            echo "  ${IP}: not running"
        fi
    done
    echo "still running on $RUNNING_NOW hosts"
    IPS_STILL_RUNNING=${RUNNING_NOW}
    sleep 10
done
0

If you want to ssh to multiple hosts in parallel, use a program like pdsh (Parallel Distributed Shell).

For example, if your ip.txt contains IP addresses instead of hostnames, or a mix of hostnames & IP addresses:

hosts="$(awk '{l = l","$0}; END {sub(/^,/,"",l); print l}' ip.txt)"
while pdsh -l ubuntu -w "$hosts" 'pgrep -f pattern' 2>/dev/null | 
        grep pattern ; do
  sleep 10
done

This uses awk to build a comma-separated list of IP addresses to connect to with ssh.

If ip.txt file contains only hostnames instead of IP addresses, it's a lot simpler:

while pdsh -l ubuntu -F ./ip.txt -a 'pgrep -f pattern' 2>/dev/null | 
        grep pattern ; do
  sleep 10
done

Both of these assumes that ip.txt has one IP address or hostname per line.

The while loop runs until none of the hosts produce any output that matches the pattern. stderr from the pdsh command is redirected to /dev/null to avoid spamming the terminal with error messages when the pgrep command returns exit code 1 to pdsh.

The only output is from the grep pattern in the pipeline. Use grep -q pattern if you want this silenced too.

cas
  • 78,579