0

In a script that contains the following lines and is called by ansible (via ssh and then sudo), about 4 out of 5 times, I end up in the if branch, although the process I'm trying to ps and grep is permanently running. The Count: echoed in that case is "0".

PROC=[a]sdf # brackets, so we don't grep the grep process itself
# sleep 0.1
PID=(`ps -ef|grep -E "$PROC"|awk '{print $2}'`)
PIDLEN=`echo "${#PID[@]}"`

if [ "PIDLEN" -ne 1 ]; then
    echo \"$PROC\" does not designate a single, unique process. Count: "$PIDLEN"

Now the funny thing is, when I uncomment the "sleep 0.1" line, I never get into the if branch, which is what I would expect.

This script is started in detached mode using nohup, which may be related. The playbook looks something like this:

- shell:
    cmd: nohup ./script.sh </dev/null >/tmp/out 2>/tmp/err &

Update

Irrelevant to my question, but someone in the comments seems to believe that pgrep is always preferable to a ps|grep combo. Just to prove that that's not necessarily the case - it is in fact the reason I do not use pgrep here - try this in debian 10:

$ sleep 1000 $(seq 1 1200)|wc -c&
[1] 13821
$ ps -ef|grep [1]200|wc -l
1
$ pgrep -f 1200|wc -l
0
$ pgrep -f 1038|wc -l
1
$ pgrep -f 1039|wc -l
0
$ pgrep -af 1038|wc -c
4102 # 5-digit PID + SPACE + 4096 chars
$ ps -ef|grep [1]200|wc -c
4952

This shows that pgrep greps/prints only 4096 characters per process, whereas ps -ef does more than that. (Obviously ps|grepping for a number is not a safe thing to do in general, I'm just using this to prove the point.)

All the above is easily verifiable for anyone using this Dockerfile:

FROM debian:10
RUN apt-get clean
RUN apt-get update
RUN apt-get install -y --no-install-recommends procps

which can be built and run using these commands

docker build -t debian10-pgrep-vs-ps .
docker run --rm -it debian10-pgrep-vs-ps
Evgeniy Berezovsky
  • 775
  • 1
  • 7
  • 20
  • In ps -ef|grep -E "$PROC"|awk '{print $2}' the grep -E "$PROC" may find itself. NEVER use that ps -fe | grep | awk | sed [censored], use pgrep "$PROC" or pgrep -f "$PROC". –  Mar 03 '20 at 11:44
  • we can use PID module instead of running a shell script in cmd module – Siva Mar 03 '20 at 11:48
  • @msp9011 That's a good, practical idea to consider. I'd like to know though why the current way fails without sleep, and how to perhaps get rid of sleep... – Evgeniy Berezovsky Mar 03 '20 at 21:54
  • NEVER use ps -fe | grep | grep | awk. Do you think that ps -fe will print lines longer than 4096 characters? Think again. Your sleep .1 may bring a little commotion, causing the processes from the pipeline to be scheduled a bit differently, and the grep to find itself. It's absolutely pointless to reason about such things. –  Mar 03 '20 at 22:02
  • @mosvy I know for a fact that ps -ef prints more than 4096 characters (per process), whereas pgrep -f does not. I don't want to use ps, but it seems I have to. – Evgeniy Berezovsky Mar 03 '20 at 22:08
  • sleep 3600 $(seq 1 10000) &. ps -fe | grep 10000 => grep 10000. Where's sleep if ps -fe prints more than 4096 characters per process? Here it is grep 10000 /proc/[0-9]*/cmdline => /proc/6613/cmdline. cat /proc/6613/comm => sleep. –  Mar 03 '20 at 22:19
  • @mosvy Your side show is irrelevant to my problem, as I'm not grepping in a way that could result in me grepping the grep process. Just to give you an example, change your grep 10000 to grep -E [1]0000, and you won't grep yourself. Not even sure if the '-E' is necessary here. – Evgeniy Berezovsky Mar 03 '20 at 22:23
  • I have explained you exactly what happened. 2. Your "fact" is only your fancy, and you're not able to admit you were wrong 3. BEHAVE, this is a public place.
  • –  Mar 03 '20 at 22:28
  • @mosvy You're shameless, in removing the edit to my own question that proves you wrong. I've added it back, removing the part about your missionary zeal, I hope you don't censor that again, for the benefit of others. In your edit you claim it cannot be reproduced in Debian 10. Did you actually try? I have tried it in 2 different installations, one using the official AWS AMI, and one with a Debian I installed myself. Both show the issue. If you can't reproduce it, prove it. Just as I have provided proof. – Evgeniy Berezovsky Mar 03 '20 at 23:07