2

I have two command line programs that typically are run in serial on a linux system.

Typical execution for both programs is simply this:

  1. Program A is run. It is input a simple text file and it outputs a simple text file.
  2. Program B is run after A, its input is the text file that program A produced. It also outputs a simple text file.

Note: for both of the above programs, their inputs and outputs are simply paths to the respective input and output files. Example: $ prog_a /path/to/inputfile/dataIn.txt /path/to/outputfile/dataOut.txt $ prog_b /path/to/inputfile/dataOut.txt /path/to/outputfile/results.txt

These are programs that were developed by third parties. Thus we cannot easily modify them (at least not in a timely manner). However we want to speed up execution by running them in parallel using named pipes. The data files are extremely large at times and parallel processing we assume would speed things up. I have been tasked with this project and have proceeded as follows.

Wrote a bash script where:

  1. Create a named pipe that links the two programs. Call it dataOut.pipe
  2. Program A reads in the text file as usual but instead of writing to a text file like before, it writes to the pipe created in step 1, dataOut.pipe
  3. Program B reads in the pipe that was output from program A.

The bash script looks something like this:

\#!/bin/bash
mkfifo dataOut.pipe
prog_b dataOut.pipe  results.txt &
prog_a dataIn.txt  dataOut.pipe
rm dataOut.pipe

Now this works... sometimes... Many times I get a java exception spitout to stderror. And I cannot figure out what exactly the problem is but I think it is something along the line of this:

Could program B sometimes run faster than A and clean out the pipe faster than A can put data into it, which causes the whole thing to crater?

If that is the case, what is an easy work around? Or could there be something else going on?

Wikkyd
  • 21
  • There isn't enough information here to know the cause of your problem. Also, this is not a support forum; if your question is just about how pipes work, please simplify it down to just that. It may or may not be the cause of the java exceptions you see (it's probably not). Also see The Fundamental Philosophy of Debugging. – Wildcard Jul 27 '17 at 23:59
  • The java exception directly pertains to the pipe. Sorry if I did not make that clear enough. The Exception states it is due to a broken pipe. The exact verbage was java.io.IOException: Broken Pipe. – Wikkyd Jul 28 '17 at 00:07
  • So I guess my question could be simplified to this: If a reader cleans out a pipe faster than a writer can write to it, will it result in exceptions like this? – Wikkyd Jul 28 '17 at 00:08
  • See https://unix.stackexchange.com/a/139494/135943. Are you getting the exception from program A or B? – Wildcard Jul 28 '17 at 00:12

1 Answers1

0

A broken pipe means that the writer (prog_a) is trying to write into a pipe that was closed by its reader (prog_b). You don't give us enough information to figure out why prog_b stops so quickly.

That said, you assume that prog_b reads its input file sequentially until EOF is met, and process each line as they are read, like a regular Unix filter command. Are you sure of that? If prog_b wants to seek into its input file, or mmap it, your are doomed (same for prog_a). And if prog_b wants to read all the input lines and only then process them, you will hardly achieve anything in parallelizing prog_a and prog_b because prog_b will start its processing only when the pipe is closed, that is when prog_a has ended.

xhienne
  • 17,793
  • 2
  • 53
  • 69