2

I have a text file, inside of this file is a number, and I have a script.sh in ksh. The script reads the file and gets the number, then increases the number by 1 and overwrites the new number in the file, then sleeps for some time and the process repeats until the number is equal to 120.

I want to have this script running n times at the same time, how do I do this?

But then I will have n processes trying to edit the file.txt and in order to have just one process editing and when it finishes (sleeps) the second can edit and so on...

Probably you will tell me that I have to use a lockfile, but unfortunately I can't use that, so I have to find another way in order to simulate semaphores.

Any ideas?

Thanks.

Kevin
  • 40,767
Jasmin
  • 21
  • Seems like this question might help: http://unix.stackexchange.com/questions/70/what-unix-commands-can-be-used-as-a-semaphore-lock – Scott Hoffman Jan 12 '12 at 20:04
  • oh yeah that can help me, but sometimes I'm hard to understand, so could anyone explain me that answer about mkdir? I understand that it create a temp directory? and in that directory I have to put my script? – Jasmin Jan 12 '12 at 20:40
  • The link from Scott leads right to the solution of your problem. Your script has to call create_lock_or_wait before you read the file and remove_lock after you wrote the increased number. – Nils Jan 12 '12 at 21:07
  • I'll try this too – Jasmin Jan 13 '12 at 16:07
  • 1
    Why can't you use a lock file? Whenever you ask “I want to do X, and I don't want obvious solution Y”, you'd better explain why Y is no good for you. – Gilles 'SO- stop being evil' Jan 13 '12 at 20:28
  • @Gilles because on the unix box that I'm working doesn't exist lock file, so I need to do it in another way – Jasmin Jan 16 '12 at 14:42

3 Answers3

1

I would prefer to control the number's manipulation from outside and just call the script passing it the current number as parameter:

n=10
read nr < number.file
seq $((nr+1)) 120 | xargs -n 1 -P $n script.sh

Where the script itself would be reduced to something like this:

#!/bin/ksh
number=$1
echo "Do job with/for number $number"
echo $number > number.file
sleep 10

Of course, if the task's duration may vary then is better to check the current content of the number file before writing to it, so not to overwrite a greater one. But this is important only if the succession of tasks needs to support kind of resuming.

manatwork
  • 31,277
  • but this is concurrency? cos I need to have a same script running n times at the same time trying to edit the text file, although all the processes can't edit the file at same time, they have to wait for its turn. – Jasmin Jan 12 '12 at 20:23
  • Yes, xargs' says about the -P (--max-procs) option: “Run up to max-procs processes at a time; the default is 1.” So that code will start $n processes, passing the next sequence number to each as parameter. As soon as one of the processes terminates, a new one is started, passing it the next sequence number. And so on, until the sequence of numbers is finished. – manatwork Jan 13 '12 at 08:28
  • I tried it but... I think seq is not avaliable on the unix box that I'm working, it says seq not found, and also the -P option of xargs it says that it is unknown. – Jasmin Jan 13 '12 at 16:42
  • Sorry, the -P was the essence. Thanks for letting me know about its absence on Unix. Another command capable to execute parallel tasks is make with -j (--jobs) option, but possible to be GNU extension. Anyway, my make knowledge is too weak to make this an answer. – manatwork Jan 13 '12 at 17:00
1

Doing that without a lockfile would seem to require something weird, because a lockfile (better yet, lock directory, as directory creation is supposed to be atomic) is the standard, and workable solution to this problem. I've not done exactly what you want to do, but here are some ideas that occur to me:

You could write a small C program to check status of a SysV semaphore. Start with man semget or man semop. This will be tedious and bizarre.

You could use Oracle sqlplus and a chunk of PL/SQL that does:

lock table table_name in exclusive mode

And then another invocation of sqlplus to release the lock. I've done that sort of thing before, you have to take a lot of care not to leave processes waiting, and to release the lock. I also may have only been interested in doing work on the table, so I only had one call to sqlplus.

Further out in left field, you might be able to use named pipes as mutual exclusion locks. You'd really have to experiment with that.

If you can load a kernel module, maybe the kernel module could use a /proc virtual file to act as a mutex or semaphore. IBM developerWorks has an article on a loadable module that creates /proc files.

Perhaps you could implement Dekker's algorithm with values in files or named pipes.

After reviewing what I wrote, I'm not too sure that any of these except the C program doing semop() would really work. They all require a lot of experimentation.

  • I see, but I need to do all in korn shell, so I think this is not what I'm looking for. Thank you so much as well =) – Jasmin Jan 12 '12 at 20:24
  • You can run sqlplus (Oracle's SQL interpreter) from inside a shell script. I've done it many times. You can access named pipes from a script. You just might be able to do Dekker's Algorithm from a shell script, too. If you rule out a lock file/lock dir, you have to do something weird, as a lock dir is the usual way to do stuff. –  Jan 13 '12 at 01:20
1

Assumptions:

  • All your script instances are running on the same machine.
  • There is a known directory where your script can write, and that no other program will use. This directory is on an “ordinary” filesystem (in particular, not NFS).

You can use atomic operations such as file creation (set -C; (: >foo) 2>/dev/null), renaming (mv) and deletion (rm) to handle locking.

To notify a process, you can send it a signal; however it is problematic to locate the target processes: if you store process IDs somewhere, you can't be sure that the IDs are still valid, they may have been reused by an unrelated process. One way to synchronize two processes is to write a byte on a pipe; the reader will block until a writer comes up and vice versa.

First, set up the directory. Create a file called lock and a named pipe called pipe.

if ! [ -d /script-locking-directory ]; then
  # The directory doesn't exist, create and populate it
  {
    mkdir /script-locking-directory-$$ &&
    mkfifo /script-locking-directory-$$/pipe &&
    touch /script-locking-directory-$$/lock &&
    mv /script-locking-directory-$$ /script-locking-directory
  } || {
    # An error happened, so clean up
    err=$?
    rm -r /script-locking-directory-$$
    # Exit, unless another instance of the script created the directory
    # at the same time as us
    if ! [ -d /script-locking-directory ]; then exit $?; fi
  }
fi

We'll implement taking the lock by renaming the lock file. Furthermore, we'll use a simple scheme to notify all waiters on the lock: echo a byte to a pipe, and have all waiters wait by reading from that pipe. Here's a simple scheme.

take_lock () {
  while ! mv lock lock.held 2>/dev/null; do
    read <pipe # wait for a write on the pipe
  done
}
release_lock () {
  mv lock.held lock
  read <pipe & # make sure there is a reader on the pipe so we don't block
  echo >pipe # notify all readers
}

This scheme wakes up all waiters, which can be inefficient, but that won't be a problem unless there's a lot of contention (i.e. a lot of waiters at the same time).

The major problem with the code above is that if a lock holder dies, the lock will not be released. How can we detect that situation? We can't just go looking for a process called like the lock, because of PID reuse. What we can do is open the lock file in the script, and check if the lock file is open when a new script instance starts.

break_lock () {
  if ! [ -e "lock.held" ]; then return 1; fi
  if [ -n "$(fuser lock.held)" ]; then return 1; fi
  # If we get this far, the lock holder died
  if mv lock.held lock.breaking.$$ 2>/dev/null; then
    # Check that someone else didn't break the lock and take it just now
    if [ -n "$(fuser lock.breaking.$$)" ]; then
      mv lock.breaking.$$ lock.held
      return 0
    fi
    mv lock.breaking.$$ lock
  fi
  return 0 # whether we did break a lock or not, try taking it again
}
take_lock () {
  while ! mv lock lock.taking.$$ 2>/dev/null; do
    if break_lock; then continue; fi
    read <pipe # wait for a write on the pipe
  done
  exec 9<lock.taking.$$
  mv lock.taking.$$ lock.held
}
release_lock () {
  # lock.held might not exist if someone else is trying to break our lock.
  # So we try in a loop.
  while ! mv lock.held lock.releasing.$$ 2>/dev/null; do :; done
  exec 9<&-
  mv lock.releasing.$$ lock
  read <pipe & # make sure there is a reader on the pipe so we don't block
  echo >pipe # notify all readers
}

This implementation can still deadlock if the lock holder dies while another instance is inside take_lock: the lock will remain held until a third instance starts. I also assume that a script won't die inside take_lock or release_lock.

Warning: I wrote the code above directly in a browser, I haven't tested it (let alone proved it correct).

  • this is very interesting. – Jasmin Jan 16 '12 at 15:11
  • I'm beginner in this stuff so I'm learning and sometimes it's dificult to me to understand. So I have some questions the lock and pipe files what is inside them? the take_lock do I have to use it before It read the number? and the release_lock after it write the new number? – Jasmin Jan 16 '12 at 15:20
  • @Jasmin The lock file is empty. Its name indicates whether the lock is taken or not. You call take_lock and release_lock around your code's critical section, as usual with a lock. – Gilles 'SO- stop being evil' Jan 16 '12 at 18:19
  • ok I tried but I got some errors: the first in this line "if ! [ -d /script-locking-directory ]; then " the "!" said not found and I guess that doesn't enter to the if and don't create the directory and for that reason I think I got the next error: mv: lock.heid cannot access: no such file or directory – Jasmin Jan 16 '12 at 20:51
  • @Jasmin I think you made a mistake when copying the code. I tested my first snippet (not my second one). lock.heid doesn't appear anywhere in my post. – Gilles 'SO- stop being evil' Jan 16 '12 at 21:01
  • Yeah, I am with the first one too, now I figured out it's the "!" on the beginning of the instructions for some reason doesn't recognize that (always got: ./script.sh[36]: !: not found) , and that's way it doesn't create any directory or the new file lock.held – Jasmin Jan 16 '12 at 21:45