I have two command line programs that typically are run in serial on a linux system.
Typical execution for both programs is simply this:
- Program A is run. It is input a simple text file and it outputs a simple text file.
- Program B is run after A, its input is the text file that program A produced. It also outputs a simple text file.
Note: for both of the above programs, their inputs and outputs are simply paths to the respective input and output files. Example: $ prog_a /path/to/inputfile/dataIn.txt /path/to/outputfile/dataOut.txt $ prog_b /path/to/inputfile/dataOut.txt /path/to/outputfile/results.txt
These are programs that were developed by third parties. Thus we cannot easily modify them (at least not in a timely manner). However we want to speed up execution by running them in parallel using named pipes. The data files are extremely large at times and parallel processing we assume would speed things up. I have been tasked with this project and have proceeded as follows.
Wrote a bash script where:
- Create a named pipe that links the two programs. Call it dataOut.pipe
- Program A reads in the text file as usual but instead of writing to a text file like before, it writes to the pipe created in step 1, dataOut.pipe
- Program B reads in the pipe that was output from program A.
The bash script looks something like this:
\#!/bin/bash
mkfifo dataOut.pipe
prog_b dataOut.pipe results.txt &
prog_a dataIn.txt dataOut.pipe
rm dataOut.pipe
Now this works... sometimes... Many times I get a java exception spitout to stderror. And I cannot figure out what exactly the problem is but I think it is something along the line of this:
Could program B sometimes run faster than A and clean out the pipe faster than A can put data into it, which causes the whole thing to crater?
If that is the case, what is an easy work around? Or could there be something else going on?