I am having problems adapting a bash script to handle a simple parallel execution from Ubuntu 20.04 to CentOS Linux 8. In the script I spawn multiple "readers" that read a string from a FIFO and output it on a common file. The strings are passed to the FIFO directly from the main script. I use file descriptors and locks to make the process clean. I'm also waiting that all the readers start before writing on the FIFO. I'm including the whole script at the end.
The script work flawlessly on Ubuntu, and the output is this (first column is the reader ID, the second is what the reader got from the FIFO)
2 1
4 2
1 3
...
3 4998
2 4999
4 5000
The reading is complete and the messages were transmitted one by one.
Using the same script on CentOS I get this.
4 1
1 7
2 8
...
3 153
4 154
1 155
There is an evident jump, and the messages from 2 to 6 are lost completely. Moreover, the process stops very prematurely at 155.
I really don't know what's going on. Any idea?
The script:
#!/bin/bash
readers=4
objs=5000
echo "" > output.txt
Temporary files and fifo
FIFO=$(mktemp -t fifo-XXXX)
START=$(mktemp -t start-XXXX)
START_LOCK=$(mktemp -t lock-XXXX)
FIFO_LOCK=$(mktemp -t lock-XXXX)
OUTPUT_LOCK=$(mktemp -t lock-XXXX)
rm $FIFO
mkfifo $FIFO
Cleanup trap
cleanall() {
rm -f $FIFO
rm -f $START
rm -f $START_LOCK
rm -f $FIFO_LOCK
rm -f $OUTPUT_LOCK
}
trap cleanall exit
Reader process
reader() {
ID=$1
exec 3<$FIFO
exec 4<$FIFO_LOCK
exec 5<$START_LOCK
exec 6<$OUTPUT_LOCK
# Signal the reader has started
flock 5
echo $ID >> $START
flock -u 5
exec 5<&-
# Reading loop
while true; do
flock 4
read -su 3 item
read_status=$?
flock -u 4
if [[ $read_status -eq 0 ]]; then
flock 6
echo "$ID $item" >> output.txt
flock -u 6
else
break # EOF reached
fi
done
exec 3<&-
exec 4<&-
exec 6<&-
}
Spawn readers
for ((i=1;i<=$readers;i++)); do
reader $i &
done
exec 3>$FIFO
Wait for all the readers
exec 5<$START_LOCK
while true; do
flock 5
started=$(wc -l $START | cut -d \ -f 1)
flock -u 5
if [[ $started -eq $readers ]]; then
break
else
sleep 0.5s
fi
done
exec 5<&-
Writing loop
for ((i=1;i<=$objs;i++)); do
echo $i 1>&3
done
exec 3<&-
wait
echo "Script done"
exit 0
rm
. – terdon Apr 13 '22 at 12:36output.txt
is on a local filesystem, but not over NFS. – Fravadona Apr 13 '22 at 19:02>> output.txt
from inside to outside thewhile
loop (done >> output.txt
) seems to fix the problem – Fravadona Apr 13 '22 at 21:48>> output.txt
I have a call to a program, and the messages are the arguments to be passed. Do you have any idea if a similar workaround could be implemented there? – Sine Apr 14 '22 at 06:42