As AlexP has explained in comments, the commands in a pipeline run in parallel. You seem to be convinced that this is not so; please forget this misconception, you won't be able to understand what's happening as long as you don't open your mind.
Because the processes are running in parallel, the sequence of operations depends on the exact timing and may not be reproducible from one run to the next.
Taking your first example, the following commands run in parallel:
seq 1 12773
tee /dev/null
wc -l > tmp.txt
(a process substitution also creates a pipe and runs the command in parallel)
head -$((0x`openssl rand -hex 7` % `cat tmp.txt` + 1))
— this involves three different commands, and head
starts after both openssl
and cat
have exited
tail -1
Since wc -l > tmp.txt
and cat tmp.txt
run in parallel, it's unpredictable when cat tmp.txt
will run in relation to the output from wc
:
- It may run before the redirection to
tmp.txt
is performed, and either pick up the file from a previous run if there is one, or complain that the file doesn't exist otherwise.
- It may run after the redirection is performed, but before
wc
produces any output; in that case the file is empty since the redirection truncates the file.
- It may run while
wc
is producing output, and pick up only the beginning of the output. On most systems, wc
produces its output atomically (because it's so short) so this won't happen.
- It may run after
wc
has finished producing output.
Experimentally, I get the same results as you (on a Linux machine running kernel 3.16 which is otherwise mostly idle): with seq 1 12773
, cat tmp.txt
picks up the output of wc
; with seq 1 12774
, cat tmp.txt
picks up an empty file. So why is there a difference between 12773 and 12774, but the results are pretty reliable below that value?
$ seq 1 12774 | wc -c
65538
There's a threshold at 65536 bytes, and that value is the capacity of the pipe buffer. The head …
command is slow to start, because it has to run openssl
and cat
to completion first. While it's starting, the previous command in the pipeline writes to the pipe buffer. When the pipe buffer gets full, the previous command has to stall. With numbers up to 12773, the pipe buffer never fills, so in all likelihood seq
finishes running before openssl
(it has a lot less work to do) and wc
has the time to write its output. But with numbers larger than 12774, the pipe buffer fills up; so tee
is stuck writing to the output that goes to head …
, and doesn't finish writing the output to wc
yet. In the meantime, cat
runs with an empty file.
When you add more pipes, each has its own buffer, so there's more room before tee
stalls.
find
andgrep
in parallel. That's the very idea behind piping. There is a large difference betweencmd1 | cmd2
andcmd1 ; cmd2
. – AlexP Dec 11 '16 at 23:53find
works as it always does.sort
starts in parallel and reads the output offind
as it is produced. Whenfind
endssort
received end-of-file on stdin and outputs the result. Please read the manual to be convinced that in a pipeline all commands run in parallel. – AlexP Dec 12 '16 at 00:00sleep 2 | sort -r | ps f
for edification. – AlexP Dec 12 '16 at 00:18