9

Trying here to write a shell script that keeps testing my server and email me when it becomes down.

The problem is that when I logout from ssh connection, despite running it with & at the end of command, like ./stest01.sh &, it automatically falls into else and keeps mailing me uninterruptedly, until I log again and kill it.

#!/bin/bash
while true; do
    date > sdown.txt ;
    cp /dev/null pingop.txt ;
    ping -i 1 -c 1 -W 1 myserver.net > pingop.txt &
    sleep 1 ;
    if
        grep "64 bytes" pingop.txt ;
    then
        :
    else
        mutt -s "Server Down!" myemail@address.com < sdown.txt ;
        sleep 10 ;
    fi
done
Baraujo85
  • 698

1 Answers1

20

When GNU grep tries to write its result, it will fail with a non-zero exit status, because it has nowhere to write the output, because the SSH connection is gone.

This means that the if statement is always taking the else branch.

To illustrate this (this is not exactly what's happening in your case, but it shows what happens if GNU grep is unable to write its output):

$ echo 'hello' | grep hello >&- 2>&-
$ echo $?
2

Here we grep for the string that echo produces, but we close both output streams for grep so that it can't write anywhere. As you can see, the exit status of GNU grep is 2 rather than 0.

This is particular to GNU grep, grep on BSD systems won't behave the same:

$ echo 'hello' | grep hello >&- 2>&-    # using BSD grep here
$ echo $?
0

To remedy this, make sure that the script does not generate output. You can do this with exec >/dev/null 2>&1. Also, we should be using grep with its -q option since we're not at all interested in seeing the output from it (this would generally also speed up the grep as it does not need to parse the whole file, but in this case it make very little difference in speed since the file is so small).

In short:

#!/bin/sh

# redirect all output not redirected elsewhere to /dev/null by default:
exec >/dev/null 2>&1

while true; do
    date >sdown.txt

    ping -c 1 -W 1 myserver.net >pingop.txt

    if ! grep -q "64 bytes" pingop.txt; then
        mutt -s "Server Down!" myemail@address.com <sdown.txt
        break
    fi

    sleep 10
done

You may also use a test on ping directly, removing the need for one of the intermediate files (and also getting rid of the other intermediate file that really only ever contains a datestamp):

#!/bin/sh

exec >/dev/null 2>&1

while true; do
    if ! ping -q -c 1 -W 1 myserver.net; then
        date | mutt -s "Server Down!" myemail@address.com
        break
    fi

    sleep 10
done

In both variations of the script above, I choose to exit the loop upon failure to reach the host, just to minimise the number of emails sent. You could instead replace the break with e.g. sleep 10m or something if you expect the server to eventually come up again.

I've also slightly tweaked the options used with ping as -i 1 does not make much sense with -c 1.

Shorter (unless you want it to continue sending emails when the host is unreachable):

#!/bin/sh

exec >/dev/null 2>&1

while ping -q -c 1 -W 1 myserver.net; do
    sleep 10
done

date | mutt -s "Server Down!" myemail@address.com

As a cron job running every minute (would continue sending emails every minute if the server continues to be down):

* * * * * ping -q -c 1 -W 1 >/dev/null 2>&1 || ( date | mail -s "Server down" myemail@address.com )
Kusalananda
  • 333,661
  • Using >&- will close the fd (as in, file descriptor 1 is closed), while closing the SSH connection will have a different effect (a file descriptor will be still around, but not connected to anything on the other side.) I think the point still stands, which is that GNU grep exits non-zero if it tries to write output and that fails. Yeah, best solution is just checking exit status of ping directly. – filbranden Aug 11 '19 at 14:22
  • 4
    It might be safer to just redirect everything to/from /dev/null for the entire script by adding exec </dev/null >/dev/null 2>&1 near the beginning. That way if e.g. ping decides to write something to stderr it won't cause a problem. – Gordon Davisson Aug 11 '19 at 19:35
  • @GordonDavisson I don't really see a reason to pull stdin from /dev/null here, but I sorted out the output. Thanks for the suggestion. – Kusalananda Aug 12 '19 at 15:25