0

Does anything symbolic happen in chaining bash commands via a pipe or is it all compute-pass-compute-pass?

For example in head t.txt -n 5 | tail -n 2, is head t.txt -n 5 getting computed and then tail -n 2 executes over it. Or first there is some abstraction to tell the shell that lines 3 to 5 are to be read? It might not make a difference in this example, but I guess can in other scenarios.

  • Order of commands execution in the pipeline is undefined https://unix.stackexchange.com/questions/37508/in-what-order-do-piped-commands-run, you can clearly see it with ps aux | grep init when grep lists itself. – Arkadiusz Drabczyk Mar 22 '20 at 14:01
  • All the commands from a pipeline are executed in parallel, at the same time or almost, passing data between them via a fifo buffer. There are some people who try to "demystify" pipes by explaining that a | b is basically a > tmpfile; b < tmpfile. That's WRONG: a command like yes | head -n20 would never finish if pipes worked like that. So yes, something highly symbolic happens in a pipeline ;-) –  Mar 22 '20 at 14:57
  • What do you mean by "symbolic"? – ctrl-alt-delor Mar 22 '20 at 14:59

2 Answers2

3

The shell uses the pipe(2) system call to create a bounded buffer in the kernel with two file descriptors, one to enable processes to write to the buffer, and another to enable processes to read from the buffer.

Consider a simple case:

$ p1 | p2

In this case, conceptually, the shell creates the above-mentioned pipe, fork()s, the child connects its standard output stream to the write-end of the pipe, then the child exec()s p1. Next, the shell fork()s again, the child connects its standard input stream to the read-end of the pipe, then the child exec()s p2. (I say conceptually because shells might do things in different orders, but the idea is the same.)

At that point, p1 and p2 are running concurrently. p1 will write to the pipe, and the kernel will copy the written data to the buffer. p2 will read from the pipe, and the kernel will copy the read data from the buffer. If the pipe gets full, then the kernel will block p1 in its call to write() until p2 reads something from the pipe, freeing up some space. If the pipe is empty, then the kernel will block p2 in its call to read() until p1 writes more data to the pipe.

Andy Dalton
  • 13,993
0

Of the two models that you suggest. compute-pass-compute-pass is the closest. The shell just connects the processes. It knows nothing of what they are doing.

Except, The order of execution is undefined. They effectively run at the same time. However the one on the left must output bytes before the one on the right inputs them. Data flows left to right. Data flows from the first command, out of its standard out, it then flows into the standard in, of the next process, where it is processed, it then comes out of its standard out, where it can be piped to another process, etc, etc, etc.

If there is no redirection >, <, etc. or reading from a file. Then it looks like this.

         ┌───────────┐ ┌───────────┐ ┌─────────────┐
Terminal⇨│Process one│⇨│Process two│⇨│Process Three│⇨Terminal
         └───────────┘ └───────────┘ └─────────────┘