grep command giving false positive results

Question

I am using this command to get a count of any proces running with a certain name (in a loop) and then further deciding on the count but sometimes it gives wrong answers for all the processes and sometime for few processes in the same loop. On some server there won't be any problem at all. When I check the same manually, it just truns out to be wrong

check_process() {
    process_count=$(ps -eaf | grep -v grep | grep "$1" | wc -l)
    if [ "${process_count}" -eq 1 ]; then
        PROCESS_EXISTS=0
        echo $1 " is running"
    else
        PROCESS_EXISTS=1
        echo $1 " is not running"
    fi
}

I also want to mention this that it used to work just fine for almost 2 years and only started giving trouble from last 2-3 months.

Is it possible that there's more than 1 process with a given name running. Your script assumes if the count isn't 1, then it must not be running. The answer could be, for example, 2. — Andy Dalton, Aug 16 '20 at 01:37
possible but everytime it had failed, it has always been wrong. It says "service_name is running" when nothing is running and it says "services_name is not running" when it is actually running and when I check it, it has always been a single process running. — gurmeet, Aug 16 '20 at 01:46
Related questions are https://unix.stackexchange.com/q/417043/5132 , https://unix.stackexchange.com/q/578777/5132 , https://unix.stackexchange.com/q/377296/5132 , https://unix.stackexchange.com/q/78771/5132 , https://unix.stackexchange.com/q/2062/5132 , https://unix.stackexchange.com/q/295363/5132 , and https://unix.stackexchange.com/q/37508/5132 . — JdeBP, Aug 16 '20 at 07:59

Stéphane Chazelas · Answer 1 · 2020-11-02T12:46:21.857

ps -f gives an output like:

chazelas   11042   10528  1 08:49 ?        00:00:03 /usr/lib/firefox/firefox -contentproc -childID 6 -isForBrowser -prefsLen 7847 -prefMapSize 699608 -parentBuildID 20200720193547 -appdir /usr/lib/firefox/browser 10528 true tab

Doing grep "$1" to return the lines for a given process name is wrong on many accounts:

that fails if $1 starts with -. You need grep -e "$1" or grep -- "$1". You should really get into the habit of using the end-of-options delimiter when passing arbitrary data to commands.
grep takes a regexp pattern to match on (that's the re in grep). So grep a.py would match on aspy for instance. You can use grep -F to search for substrings instead.
ps -f does not report the process names but the process arguments (including argv[0] which by convention is generally a path to the command).
you're grepping for regexp/substrings, without restricting where within the lines it matches. So above for $1 == as for instance, it would match on the as in chazelas. Or it could match on the other arguments than argv[0].
grep | wc -l is grep -c.

Here, there is a (non-standard but very common) command to match processes by name (or other criteria for that matters): pgrep

pgrep -xc -- "$1"

Would count the processes whose name (like that reported by ps without -f) matches exactly the regexp in $1. With -f, pgrep matches on the full list of arguments (as reported by ps -f) instead of the process name.

That leaves the problem of process names containing regexp operators (like the . mentioned above). Unfortunately, pgrep doesn't have a -F option to do a string comparison instead of regexp match, so you'd need to escape the regexp operators in there.

Another option is to tell ps to only report the process names and then use grep -xF to do exact Fixed-string comparison:

ps -Ao comm= | grep -Fxce "$1"

Replace comm with args to print the list of arguments (possibly truncated though, some ps implementations allow one or more -w to raise the line length limit) instead of process name.

In any case, any user can create any process with any name and with any list of arguments, so finding processes by name is quite brittle. Anybody can trick you into thinking your process is there by starting one with the same name and same list of arguments running a completely different command.

It's generally better to check for the availability of services that your process is providing or hold on some resource it's currently using.

In some cases, you can add other criteria to your search such as effective uid (-u in pgrep), or the path of the executable it is currently running.

On Linux and with zsh:

pids=(/proc/<->(Nnu[chazelas]e['[[ $REPLY/exe -ef /usr/bin/sleep ]]']:t]))

Would store in $pids the ids of the processes that are running code in the /usr/bin/sleep file as the chazelas user for instance and you can use if (($#pids > 0)); then... to check that that list is not empty. (replace /usr/bin/sleep with =sleep or $commands[sleep] for searching the sleep command in $PATH).

More generally, it's much better to rely on your service manager to manage services and processes. Modern ones such as systemd will provide facilities to do that reliably.

score 0 · Answer 2 · answered Aug 16 '20 at 01:40

0

There can be some weird timing effects in some cases.
Depending on what you are grepping for, there could be other matches. It can be worth checking without the wc. Due to the way you are checking, if there are two matches, you report it not running. (I.e. consider using -ge instead of -eq.)
A more effective test can be to use killall -0 commandname. Killing with signal 0 sends no actual signal, but still does all the error checking, including being an error if nothing was found. (One downside: killing a process that isn't your own is also an error, unless you are root. The --user option can help with this.)

answered Aug 16 '20 at 01:40

David G.

1,369

I have always checked the proces count before fixing the issue manually (without "wc") and it has always been 1 process running or no process. BTW for some services in the same list on the same server, I am getting no issues. Also, regarding your 3rd point, I cannot do it as it is a must to stop the process properly using some other set of commands. Thank you for your though. – gurmeet Aug 16 '20 at 01:51
1

@gurmeet You missed the point of the third point. kill -0 DOES NOT STOP the process. In fact, it doesn't affect the process in any way. What it does is say "can I affect the process?" And with killall, there is also "Is there a process for me to affect?" – David G. Aug 16 '20 at 03:09
Interesting....I'll experiment with this and if it goes well, I'll update you. Thanks a lot :) – gurmeet Aug 16 '20 at 03:56

grep command giving false positive results

2 Answers2