Assumptions:
- All your script instances are running on the same machine.
- There is a known directory where your script can write, and that no other program will use. This directory is on an “ordinary” filesystem (in particular, not NFS).
You can use atomic operations such as file creation (set -C; (: >foo) 2>/dev/null
), renaming (mv
) and deletion (rm
) to handle locking.
To notify a process, you can send it a signal; however it is problematic to locate the target processes: if you store process IDs somewhere, you can't be sure that the IDs are still valid, they may have been reused by an unrelated process. One way to synchronize two processes is to write a byte on a pipe; the reader will block until a writer comes up and vice versa.
First, set up the directory. Create a file called lock
and a named pipe called pipe
.
if ! [ -d /script-locking-directory ]; then
# The directory doesn't exist, create and populate it
{
mkdir /script-locking-directory-$$ &&
mkfifo /script-locking-directory-$$/pipe &&
touch /script-locking-directory-$$/lock &&
mv /script-locking-directory-$$ /script-locking-directory
} || {
# An error happened, so clean up
err=$?
rm -r /script-locking-directory-$$
# Exit, unless another instance of the script created the directory
# at the same time as us
if ! [ -d /script-locking-directory ]; then exit $?; fi
}
fi
We'll implement taking the lock by renaming the lock
file. Furthermore, we'll use a simple scheme to notify all waiters on the lock: echo a byte to a pipe, and have all waiters wait by reading from that pipe. Here's a simple scheme.
take_lock () {
while ! mv lock lock.held 2>/dev/null; do
read <pipe # wait for a write on the pipe
done
}
release_lock () {
mv lock.held lock
read <pipe & # make sure there is a reader on the pipe so we don't block
echo >pipe # notify all readers
}
This scheme wakes up all waiters, which can be inefficient, but that won't be a problem unless there's a lot of contention (i.e. a lot of waiters at the same time).
The major problem with the code above is that if a lock holder dies, the lock will not be released. How can we detect that situation? We can't just go looking for a process called like the lock, because of PID reuse. What we can do is open the lock file in the script, and check if the lock file is open when a new script instance starts.
break_lock () {
if ! [ -e "lock.held" ]; then return 1; fi
if [ -n "$(fuser lock.held)" ]; then return 1; fi
# If we get this far, the lock holder died
if mv lock.held lock.breaking.$$ 2>/dev/null; then
# Check that someone else didn't break the lock and take it just now
if [ -n "$(fuser lock.breaking.$$)" ]; then
mv lock.breaking.$$ lock.held
return 0
fi
mv lock.breaking.$$ lock
fi
return 0 # whether we did break a lock or not, try taking it again
}
take_lock () {
while ! mv lock lock.taking.$$ 2>/dev/null; do
if break_lock; then continue; fi
read <pipe # wait for a write on the pipe
done
exec 9<lock.taking.$$
mv lock.taking.$$ lock.held
}
release_lock () {
# lock.held might not exist if someone else is trying to break our lock.
# So we try in a loop.
while ! mv lock.held lock.releasing.$$ 2>/dev/null; do :; done
exec 9<&-
mv lock.releasing.$$ lock
read <pipe & # make sure there is a reader on the pipe so we don't block
echo >pipe # notify all readers
}
This implementation can still deadlock if the lock holder dies while another instance is inside take_lock
: the lock will remain held until a third instance starts. I also assume that a script won't die inside take_lock
or release_lock
.
Warning: I wrote the code above directly in a browser, I haven't tested it (let alone proved it correct).
create_lock_or_wait
before you read the file andremove_lock
after you wrote the increased number. – Nils Jan 12 '12 at 21:07