8

I have a small script that loops through all files of a folder and executes a (usually long lasting) command. Basically it's

for file in ./folder/*;
do
    ./bin/myProgram $file > ./done/$file
done

(Please Ignore syntax errors, it's just pseudo code).

I now wanted to run this script twice at the same time. Obviously, the execution is unnecessary if ./done/$file exists. So I changed the script to

for file in ./folder/*;
do
    [ -f ./done/$file ] || ./bin/myProgram $file >./done/$file
done

So basically the question is: Is it possible that both scripts (or in general more than one script) actually are at the same point and check for the existance of the done file which fails and the command runs twice?

it would be just perfect, but I highly doubt it. This would be too easy :D If it can happen that they process the same file, is it possible to somehow "synchronize" the scripts?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
stefan
  • 1,009
  • If you have a version of xargs with the -P option available, see this question. – jw013 May 10 '12 at 08:58
  • 2
    GNU Make supports parallel execution, too; the done/$file markers seem a little like make targets to me. – sr_ May 10 '12 at 09:11
  • 2
    The (pseudo-)code you posted doesn't actually run two instances of your program in parallel. If you have either xargs or GNU make or some version of parallel, then there is no need to reinvent this particular wheel. – jw013 May 10 '12 at 09:20
  • It will run two instances if the above script is executed twice – stefan May 10 '12 at 09:22

4 Answers4

4

This is possible and does occur in reality. Use a lock file to avoid this situation. An example, from said page:

if mkdir /var/lock/mylock; then
    echo "Locking succeeded" >&2
else
    echo "Lock failed - exit" >&2
    exit 1
fi

# ... program code ...

rmdir /var/lock/mylock
Chris Down
  • 125,559
  • 25
  • 270
  • 266
  • 1
    "As an aside, you almost definitely want to quote $file." It wasn't necessary for my simple job, but obviously your right, that it would be better to do so – stefan May 10 '12 at 09:02
  • @stefan - I removed it once I saw "ignore syntax errors"... :-) – Chris Down May 10 '12 at 09:04
  • :D it's perfectly fine if you remind me of this stuff. I tend to forget it since I'm not yet used to it – stefan May 10 '12 at 09:08
  • I absolutely LOVE the simplicity of the mkdir-locking. Thanks for the link! (even though I would have preferred it reading about it on this site, maybe you want to extend your answer a bit?) – stefan May 10 '12 at 09:11
2

The two instances of your script can certainly interact in this way, causing the command to run twice. This is called a race condition.

One way to avoid this race condition would be if each instance grabbed its input file by moving it to another directory. Moving a file (inside the same filesystem) is atomic. Moving the input files may not be desirable, and this is already getting a bit complicated.

mkdir staging-$$ making-$$
for input in folder/*; do
  name=${x#folder/}
  staging=staging-$$/$name
  output=making-$$/$name
  destination=done/$name
  if mv -- "$input" "$staging" 2>/dev/null; then
    bin/myProgram "$staging" >"$output"
    mv -- "$output" "$destination"
    mv -- "$staging" "$input"
  fi
done

A simple way to process the files in parallel using a widely-available tool is GNU make, using the -j flag for parallel execution. Here's a makefile for this task (remember to use tabs to indent commands):

all: $(patsubst folder/%,done/%,$(wildcard folder/*))
done/%: folder/%
        ./bin/myProgram $< >$@.tmp
        mv $@.tmp $@

Run make -j 3 to run 3 instances in parallel.

See also Four tasks in parallel... how do I do that?

1

I have the feeling you are really trying to run multiple jobs in parallel and that the lock file is simply a means to an end.

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:

parallel ./bin/myProgram ::: ./folder/*

It will run myProgram on each core in parallel.

You can install GNU Parallel simply by:

wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem

Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Ole Tange
  • 35,514
0

The problem with locking is that you need a method that creates a lock which is uninterruptable (sometimes called atomar). As Chris has wrote in his answer mkdir is such an uninterruptable operation (creating a file is no such operation).

There is also a high-level command - usally hidden in the procmail package: lockfile. That command has some nice features and can easily be used in your own scripts without the need to "reinvent the wheel" (for instance writing your own function that locks based on directory creation).

Nils
  • 18,492