2

I am having problems adapting a bash script to handle a simple parallel execution from Ubuntu 20.04 to CentOS Linux 8. In the script I spawn multiple "readers" that read a string from a FIFO and output it on a common file. The strings are passed to the FIFO directly from the main script. I use file descriptors and locks to make the process clean. I'm also waiting that all the readers start before writing on the FIFO. I'm including the whole script at the end.

The script work flawlessly on Ubuntu, and the output is this (first column is the reader ID, the second is what the reader got from the FIFO)

2 1
4 2
1 3
...
3 4998
2 4999
4 5000

The reading is complete and the messages were transmitted one by one.

Using the same script on CentOS I get this.

4 1
1 7
2 8
...
3 153
4 154
1 155

There is an evident jump, and the messages from 2 to 6 are lost completely. Moreover, the process stops very prematurely at 155.

I really don't know what's going on. Any idea?

The script:

#!/bin/bash

readers=4 objs=5000

echo "" > output.txt

Temporary files and fifo

FIFO=$(mktemp -t fifo-XXXX) START=$(mktemp -t start-XXXX) START_LOCK=$(mktemp -t lock-XXXX) FIFO_LOCK=$(mktemp -t lock-XXXX) OUTPUT_LOCK=$(mktemp -t lock-XXXX) rm $FIFO mkfifo $FIFO

Cleanup trap

cleanall() { rm -f $FIFO rm -f $START rm -f $START_LOCK rm -f $FIFO_LOCK rm -f $OUTPUT_LOCK } trap cleanall exit

Reader process

reader() { ID=$1
exec 3<$FIFO exec 4<$FIFO_LOCK exec 5<$START_LOCK exec 6<$OUTPUT_LOCK

# Signal the reader has started
flock 5                
echo $ID &gt;&gt; $START
flock -u 5
exec 5&lt;&amp;- 

# Reading loop
while true; do
    flock 4  
    read -su 3 item
    read_status=$?
    flock -u 4  
    if [[ $read_status -eq 0 ]]; then
        flock 6
        echo &quot;$ID $item&quot; &gt;&gt; output.txt
        flock -u 6  
    else
        break # EOF reached
    fi
done

exec 3&lt;&amp;-
exec 4&lt;&amp;-
exec 6&lt;&amp;-

}

Spawn readers

for ((i=1;i<=$readers;i++)); do reader $i & done

exec 3>$FIFO

Wait for all the readers

exec 5<$START_LOCK while true; do flock 5 started=$(wc -l $START | cut -d \ -f 1) flock -u 5 if [[ $started -eq $readers ]]; then break else sleep 0.5s fi done exec 5<&-

Writing loop

for ((i=1;i<=$objs;i++)); do echo $i 1>&3 done

exec 3<&- wait

echo "Script done"

exit 0

Sine
  • 21
  • Probably not the issue, but you have a couple of bad practices there: i) don't use CAPS for your variable names, that can lead to naming collisions with global environment variables which can cause very hard to debug problems; ii) *always quote your variables*, especially when using them with destructive commands such as rm. – terdon Apr 13 '22 at 12:36
  • @terdon Noted that. Thanks. – Sine Apr 13 '22 at 14:12
  • Note: as a workaround I used a temporary file to store all the messages. Then I have the read to read and delete one line from the top until the list is over. The effect is similar to a fifo, but of course it is not a pipe and it is less flexible and performing. – Sine Apr 13 '22 at 15:19
  • Here's what I observed while running your script on RedHat 8: It works as expected when output.txt is on a local filesystem, but not over NFS. – Fravadona Apr 13 '22 at 19:02
  • moving >> output.txt from inside to outside the while loop ( done >> output.txt ) seems to fix the problem – Fravadona Apr 13 '22 at 21:48
  • @Fravadona nice catch. In the actual script that I am adapting I'm not writing a file, but I am launching a program instead. Same concept, but instead of the >> output.txt I have a call to a program, and the messages are the arguments to be passed. Do you have any idea if a similar workaround could be implemented there? – Sine Apr 14 '22 at 06:42
  • In that case the concurrent reading part is enough – Fravadona Apr 14 '22 at 12:42

0 Answers0