Weird results of triple piping to STDOUT & tee /dev/null >(wc -l > tmp.txt) & piping again embedding `cat tmp.txt`

Question

$ seq 1 12773 | tee /dev/null >(wc -l > tmp.txt) | head -$((0x`openssl rand -hex 7` % `cat tmp.txt` + 1))|tail -1

--> 8473 (random between 1~12773)

$ cat tmp.txt

--> 8473

$ seq 1 12774 | tee /dev/null >(wc -l > tmp.txt) | head -$((0x`openssl rand -hex 7` % `cat tmp.txt` + 1))|tail -1

--> (NULL)

$ cat tmp.txt

--> 8844 (random between 1~12773)

$ seq 1 25011 | tee /dev/null >(wc -l > tmp.txt) | cat | head -$((0x`openssl rand -hex 7` % `cat tmp.txt` + 1))|tail -1

--> 13778 (random between 1~25011)

$ cat tmp.txt

--> 13778

$ seq 1 25012 | tee /dev/null >(wc -l > tmp.txt) | cat |head -$((0x`openssl rand -hex 7` % `cat tmp.txt` + 1))|tail -1

--> (NULL)

$ cat tmp.txt

--> 24939 (random between 1~25012)

$ seq 1 46014 | tee /dev/null >(wc -l > tmp.txt) | cat | cat |head -$((0x`openssl rand -hex 7` % `cat tmp.txt` + 1))|tail -1

--> 34111 (random between 1~46014)

$ cat tmp.txt

--> 34111 (random between 1~46014)

$ seq 1 46015 | tee /dev/null >(wc -l > tmp.txt) | cat | cat |head -$((0x`openssl rand -hex 7` % `cat tmp.txt` + 1))|tail -1

--> (NULL)

$ cat tmp.txt

--> 343 (random between 1~46014)

As the number of '| cat's behind "(wc -l > tmp.txt)" increases, you can make the above commands to work with bigger number of lines.

What's going on?

You are writing to a file you are reading from? That's often not a good idea. — phk, Dec 11 '16 at 12:26
@AlexP, $ find -type f | grep sys <-- "grep sys" doesn't begin its job before "find -type f" reaches the end of the filelist. Likewise, $ (beep.sh; mv -v files1* directory2/) | sed 's/^/head /' <-- "mv" nor "sed" doesn't begin its job before "beep.sh" finishes its job.) — user58029, Dec 11 '16 at 23:50
@user58029: Why do you think so? The shell launches both find and grep in parallel. That's the very idea behind piping. There is a large difference between cmd1 | cmd2 and cmd1 ; cmd2. — AlexP, Dec 11 '16 at 23:53
@AlexP, Imagine "$ find -type f | sort". Without "| sort", "find" displays its results line by line in the middle of the processing. But with "| sort", "find" does not display anything before it reaches its end of the result so that it suddenly pops up the result. — user58029, Dec 11 '16 at 23:57
@user58029: In your example, find works as it always does. sort starts in parallel and reads the output of find as it is produced. When find ends sort received end-of-file on stdin and outputs the result. Please read the manual to be convinced that in a pipeline all commands run in parallel. — AlexP, Dec 12 '16 at 00:00
@Alexp, I don't agree. "$ seq 1 10000000 | sort -r > tmp.txt" <-- "sort -r" can't be started before "seq" reaches 1000000, and even tmp.txt can't be filled with a single line before "sort" sort-reverse all the lines of "seq". — user58029, Dec 12 '16 at 00:12
@Alexp, Ok I understood. But the matter is the buffer processing of each command in order. Regarding the top above examples I showed first are showing that the piped commands' reaction differ respectively depending on the processing amount of data of the previous command of each. — user58029, Dec 12 '16 at 00:35

score 0 · Answer 1 · edited Apr 13 '17 at 12:36

As AlexP has explained in comments, the commands in a pipeline run in parallel. You seem to be convinced that this is not so; please forget this misconception, you won't be able to understand what's happening as long as you don't open your mind.

Because the processes are running in parallel, the sequence of operations depends on the exact timing and may not be reproducible from one run to the next.

Taking your first example, the following commands run in parallel:

seq 1 12773
tee /dev/null
wc -l > tmp.txt (a process substitution also creates a pipe and runs the command in parallel)
head -$((0x`openssl rand -hex 7` % `cat tmp.txt` + 1)) — this involves three different commands, and head starts after both openssl and cat have exited
tail -1

Since wc -l > tmp.txt and cat tmp.txt run in parallel, it's unpredictable when cat tmp.txt will run in relation to the output from wc:

It may run before the redirection to tmp.txt is performed, and either pick up the file from a previous run if there is one, or complain that the file doesn't exist otherwise.
It may run after the redirection is performed, but before wc produces any output; in that case the file is empty since the redirection truncates the file.
It may run while wc is producing output, and pick up only the beginning of the output. On most systems, wc produces its output atomically (because it's so short) so this won't happen.
It may run after wc has finished producing output.

Experimentally, I get the same results as you (on a Linux machine running kernel 3.16 which is otherwise mostly idle): with seq 1 12773, cat tmp.txt picks up the output of wc; with seq 1 12774, cat tmp.txt picks up an empty file. So why is there a difference between 12773 and 12774, but the results are pretty reliable below that value?

$ seq 1 12774 | wc -c
65538

There's a threshold at 65536 bytes, and that value is the capacity of the pipe buffer. The head … command is slow to start, because it has to run openssl and cat to completion first. While it's starting, the previous command in the pipeline writes to the pipe buffer. When the pipe buffer gets full, the previous command has to stall. With numbers up to 12773, the pipe buffer never fills, so in all likelihood seq finishes running before openssl (it has a lot less work to do) and wc has the time to write its output. But with numbers larger than 12774, the pipe buffer fills up; so tee is stuck writing to the output that goes to head …, and doesn't finish writing the output to wc yet. In the meantime, cat runs with an empty file.

When you add more pipes, each has its own buffer, so there's more room before tee stalls.

Weird results of triple piping to STDOUT & tee /dev/null >(wc -l > tmp.txt) & piping again embedding `cat tmp.txt`

1 Answers1