In the spirit of this question, I want to create a branched data flow from a single source:
cmd1 ──> tee ──────────────> │
├─────> cmd2 ─────> │ cmd4
└─────> cmd3 ─────> │
Unlike that question, I want my output interleaved, not one command at a time. I can do that with named pipes and paste
:
$ mkfifo fifo1 fifo2
$ seq 1 100 \
| tee \
>(awk '$0+=1' > fifo1) \
>(awk '$0+=2' > fifo2) \
| paste - fifo1 fifo2
This seems to work fine; i.e., it prints
1 2 3
2 3 4
...
100 101 102
That's a notional example to illustrate the concept. My real pipeline looks like this:
find "$1" -type d -print0 \
| tee \
>(xargs -0 -n1 du -bs | cut -f1 > fifo1) \
>(xargs -0 -n1000 stat --printf="%G\n" > fifo2) \
| xargs -0 -n1 echo \
| paste - fifo1 fifo2
This also seems to work fine in most cases. But when I run it on a huge filesystem, it eventually hangs in the middle. From reading the above question, I suspect that I have a deadlock happening. But I don't quite see where it could be -- it seems like paste
should be keeping all the data flowing.
I guess I still don't understand buffering and data flow in tee
, pipes, and fifos. Could anyone explain what I'm missing here? Where is the blockage and how do I fix it? (Or investigate further?)
-n1
instead of-n1000
in the second process substitution? – Kusalananda May 18 '21 at 16:28