4

I often want to perform an operation that requires

  1. splitting the stdout of a pipeline (let's call it pipeline-before) into two parallel streams;
  2. feeding the resulting stream, as their stdin, to two separate pipelines (pipeline-between-0 and pipeline-between-1);
  3. merging the two resulting stdout streams in a strict sequence;
  4. feeding the resulting merged stream, as its stdin to another pipeline (pipeline-after).

In (3), by "in a strict sequence" I mean that all the output coming from, say, pipeline-between-0 should appear in the merged output stream before any of the output coming from pipeline-between-1.

The whole thing could be diagrammed like this:

pipeline-before --.--- pipeline-between-0 -.
                   \                        \
                    `- pipeline-between-1 ---`-- pipeline-after

An example of such pipeline-between-0/pipeline-between-1 pair could be:

  1. head -n 1 | tr 'a-z' 'A-Z'
  2. tail -n +2 | sort -t $'\t' -k1,1

In English, one would describe this combination as "make the first row all uppercase, and sort the remaining rows by the first column."

Q: Is there a general shell syntax to express such an operation?

I am interested in the answer to this question for both zsh and bash.


Here's one syntax that, in general, does not work:

$ pipeline-before | tee >( pipeline-between-0 ) | pipeline-between-1 | pipeline-after

This syntax fails for two reasons:

  1. some of the output of pipeline-between-1 often appears before some of the output of pipeline-between-0.
  2. the final output is truncated (I suspect that the reason for this is a SIGPIPE signal).

I did try the following maneuver (which, I confess, I don't understand 100%):

{
  pipeline-before |
  { tee >( pipeline-between-0 4>&1 1>&3 ); } |
  pipeline-between-1
} 3>&1 | pipeline-after

AFAICT, this syntax seems to solve the first of the problems listed above (i.e. based on a few informal tests, the outputs of pipeline-between-0 and pipeline-between-1 show up in the correct sequence). Unfortunately, though, the final output is still truncated, at least in some cases.

kjo
  • 15,339
  • 25
  • 73
  • 114

2 Answers2

5

Maybe you just need named pipes (FIFOs)?

Following example:

 { seq 1 100000 | grep 1$ & seq 1 100000 | grep 2$ ; } > unsorted

should return a mixed result of numbers ending in 1 and 2, respectively, in file unsorted. We want them sorted (all numbers ending in 1, then all ending in 2) so now create two named pipes, one for each result, and just concatenate them in the desired order.

mkfifo stream{1,2}
{ seq 1 100000 | grep 1$ >stream1 & seq 1 100000 | grep 2$ > stream2 ; } &\
cat stream1 stream2 > sorted

Check if the files are in the same order:

diff -q {un,}sorted 

(they should differ) and see if the sorted one is sorted as desired:

sed 1,10000q sorted | grep 2$

(should be no result, while the unsorted file should return data)

Named pipes should work the same way for bash and zsh.


Or in a most generalized expression with pseudo-code:

  1. The input stream (your "pipeline-before") is replicated into an arbitrary number of parallel streams via tee and FIFOs in-i.
  2. Each stream is handled by a different command in parallel and sends its output to stream out-i (your "pipeline-between-i").
  3. Concatenate output streams in desired order and forward to next command (your "pipeline-after").

mkfifo {in,out}-{0..n}

pre-cmd | tee in-0 in-1 ... in-n | cat >/dev/null & cmd-0 <in-0 >out-0 & cmd-1 <in-0 >out-0 & .... cmd-i <in-n >out-n & cat out-0 out-1 ... out-n | after-cmd

I used cat >/dev/null instead of cat >in-n for reasons of generalization, same for the (possibly) redundant cat in the end.

Example:

mkfifo {in,out}-{0..2}
seq 0 100 | tee in-{0..2} | cat >/dev/null &
grep '33$' <in-0 >out-0 &
awk '$1<2' <in-1 >out-1 &
sed '/^.\{1,2\}$/d' <in-2 >out-2 &
cat out-2 out-0 out-1 | tr '\n' '-'

Result: 100-33-0-1-

FelixJN
  • 13,566
2

Are you looking for parallel --tee? It should deal happily with any size input as long as you have enough free disk space in /tmp for the output.

(printf "Header1\tHeader2\n"; paste <(seq 20 -1 11) <(seq 10) ) |
  parallel -k --pipe --tee ::: "head -n  1 | tr 'a-z' 'A-Z'" "tail -n +2 | sort -t $'\t' -k1,1"

Or the same expressed as bash functions:

pipeline-before() {
    printf "Header1\tHeader2\n"
    paste <(seq 20 -1 11) <(seq 10)
}
pipeline-between-0() {
    head -n  1 | tr 'a-z' 'A-Z'
}
pipeline-between-1() {
    tail -n +2 | sort -t $'\t' -k1,1
}
pipeline-after() {
    echo "This is pipeline-after"
    cat
    echo "Done"
}
export -f pipeline-before pipeline-between-0 pipeline-between-1 pipeline-after

pipeline-before | parallel -k --pipe --tee ::: pipeline-between-0 pipeline-between-1 | pipeline-after

If not, could you elaborate with more examples of input, output and what pipeline-* could be?

Ole Tange
  • 35,514