3

From APUE

FIFOs can be used to duplicate an output stream in a series of shell commands. This prevents writing the data to an intermediate disk file (similar to using pipes to avoid intermediate disk files).

But whereas pipes can be used only for linear connections between processes, a FIFO has a name, so it can be used for nonlinear connections.

Consider a procedure that needs to process a filtered input stream twice.

mkfifo fifo1
prog3 < fifo1 &
prog1 < infile | tee fifo1 | prog2

We create the FIFO and then start prog3 in the background, reading from the FIFO. We then start prog1 and use tee to send its input to both the FIFO and prog2.

  1. How does a FIFO "duplicate an output stream in a series of shell commands"? Isn't this done by tee instead of a FIFO?

  2. In the example, mkfifo fifo1 creates a file in the current directory, and fifo1 seems replaceable with a regular file . So what is the point of a FIFO "prevent writing the data to an intermediate disk file"?

  3. What do "linear connections" and "nonlinear connections" between processes mean? What does it mean that a FIFO can be used for nonlinear connections, while a pipe can be only used for linear connections between processes?

Thanks.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Tim
  • 101,790

1 Answers1

14
  1. APUE says “FIFOs can be used to duplicate an output stream”, it doesn’t say that FIFOs actually duplicate the output stream. As you point out, the duplication is done by tee in the example.

  2. mkfifo creates a FIFO, which is visible as a “file” in the containing directory; but writing to the FIFO isn’t like writing to a file because the data never hits the disk. Pipes, named or otherwise, don’t provide storage for data, they provide communications channels; the writing end of a pipe can’t write data if there’s no receiver, the pipe just passes data along, without storing it. (On most systems pipes are backed by small kernel buffers, to improve performance, but that’s an implementation detail.)

  3. Linear connections between processes are pipes which can be represented as a linear graph. In the example, you can represent the last line as

    infile → prog1 → tee fifo1 → prog3
    

    which is linear, but if you try to represent the whole chain, reducing to processing elements, you need

    infile → prog1 → prog2
                   → prog3
    

    which is non-linear (there’s one node in the graph, prog1, which has two exit nodes).

Stephen Kitt
  • 434,908
  • Thanks. (2) What is the advantage of a FIFO over a temporary regular file? (3) The nonlinear connection between processes is actually implemented by tee, not by a FIFO, for example, using tee and a temporary regular file we can achieve the same nonlinear connection too. – Tim Mar 25 '18 at 22:15
  • 1
    The description in APUE opposes FIFOs and files on the one hand, FIFOs and pipes on the other; you can’t conflate the three. When comparing FIFOs and files, the difference is that FIFOs don’t involve hitting the disk for the content written to the FIFO (this addresses your point 2). When comparing FIFOs and pipes, the difference is that FIFOs are named so they can be used in places pipes can’t. APUE doesn’t say you can’t use files to implement non-linear connections, it just says you can’t use pipes. – Stephen Kitt Mar 25 '18 at 22:18
  • Thanks. What is the advantage of "don’t involve hitting the disk for the content written to the FIFO"? – Tim Mar 25 '18 at 22:21
  • Disks are slow. (At least, they were when APUE was written.) – Stephen Kitt Mar 25 '18 at 22:21
  • Thanks. Besides speed, I am also wondering about space. Do named pipes (as used in FIFO and process substitution) reside in main memory? Does a named pipe have to hold all the content as a temporary file in disk when using the file instead of a named pipe? Does the main memory have to be large enough if the content is huge? – Tim Mar 26 '18 at 00:52
  • 2
    A pipe (FIFO or whatever) doesn’t hold data, it transfers data between processes. You can see this by running, for example, mkfifo fifo && seq 1 1000000 | tee fifo. You won’t see any input from this initially, because the FIFO doesn’t start accepting data until both its ends are open. Run head fifo, and you’ll see the seq side of the pipe output a bunch of numbers and then stop with exit code 141, i.e. SIGPIPE (128 + 13): the output gives an indication the size of the kernel buffer used for the FIFO, at most a few tens of kilobytes. That’s all the memory that’s used by a pipe. – Stephen Kitt Mar 26 '18 at 03:42
  • By the way, in your last example that prog1 needs two exit nodes (one for prog2 and one for prog3), someone can use tee under some circumstances like this: echo "hello world" | tee >(sed 's/h/H/') >(sed 's/w/W/g') – George Vasiliou Apr 11 '18 at 12:22
  • BTW: The limitation of anonymous pipes being useful only for linear connections is really a limit of shell pipeline syntax; the pipe syscall has no such limitation. – derobert Apr 11 '18 at 22:48