4

From the bash code

command1 | tee >(command2) | command3

I want to capture the output of command2 in var2 and the output of command3 in var3.

command1 is I/O-bound and the other commands are costly but can start working before command1 finishes.

The order of outputs from command2 and command3 are not fixed. So I tried to use file-descriptors in

read -r var2 <<< var3=(command1 | tee >(command2 >&3) | command3) 3>&1

or

{read -u 3 -r var2; read -r var3} <<< command1 | tee >(command2 >&3) | command3

but did not succed.

Is there a way to have the three commands run in parallel, store the results in different variables and not make temporary files?

katosh
  • 369
  • It's hard to give a negative answer for sure, but I think that would require the shell reading from two pipes at a time, and I can't think of any shell feature that could do that (in Bash, that is). How large are your outputs? – ilkkachu Mar 18 '19 at 21:20
  • @ilkkachu Thanks for the feedback! Each output is less than 4KB. – katosh Mar 18 '19 at 21:23
  • interesting, I tried using named pipes, but result were lost due to backgrounding ( & ). – Archemar Mar 19 '19 at 08:08

4 Answers4

4

So you want to pipe the output of cmd1 into both cmd2 and cmd3 and get both the output of cmd2 and cmd3 into different variables?

Then it seems you need two pipes from the shell, one connected to cmd2's output and one to cmd3's output, and the shell to use select()/poll() to read from those two pipes.

bash won't do for that, you'd need a more advanced shell like zsh. zsh doesn't have a raw interface to pipe(), but if on Linux, you can use the fact that /dev/fd/x on a regular pipe acts like a named pipe and use a similar approach as that used at Read / write to the same file descriptor with shell redirection

#! /bin/zsh -

cmd1() seq 20
cmd2() sed 's/1/<&>/g'
cmd3() tr 0-9 A-J

zmodload zsh/zselect
zmodload zsh/system
typeset -A done out
{
  cmd1 > >(cmd2 >&3 3>&-) > >(cmd3 >&5 5>&-) 3>&- 5>&- &
  exec 4< /dev/fd/3 6< /dev/fd/5 3>&- 5>&-
  while ((! (done[4] && done[6]))) && zselect -A ready 4 6; do
    for fd (${(k)ready[(R)*r*]}) {
      sysread -i $fd && out[$fd]+=$REPLY || done[$fd]=1
    }
  done
} 3> >(:) 5> >(:)

printf '%s output: <%s>\n' cmd2 "$out[4]" cmd3 "$out[6]"
terdon
  • 242,166
2

If I understood well all your requirements you could achieve that in bash by creating an unnamed pipe per command, then redirecting each command’s output to its respective unnamed pipe, and finally retrieving each output from its pipe into a separate variable.

As such, the solution might be like:

: {pipe2}<> <(:)
: {pipe3}<> <(:)

command1 | tee >({ command2 ; echo EOF ; } >&${pipe2}) >({ command3 ; echo EOF ; } >&${pipe3}) > /dev/null &
var2=$(while read -ru ${pipe2} line ; do [ "${line}" = EOF ] && break ; echo "${line}" ; done)
var3=$(while read -ru ${pipe3} line ; do [ "${line}" = EOF ] && break ; echo "${line}" ; done)

exec {pipe2}<&- {pipe3}<&-

Here note particularly:

  • the use of the <(:) construct; this is an undocumented Bash's trick to open "unnamed" pipes
  • the use of a simple echo EOF as a way to notify the while loops that no more output will come. This is necessary because it's no use to just close the unnamed pipes (which would normally end any while read loop) because those pipes are bidirectional, ie used for both writing and reading. I know no way to open (or convert) them into the usual couple of file-descriptors one being the read-end and the other its write-end.

In this example I used a pure-bash approach (beside the use of tee) to better clarify the basic algorithm that is required by the use of these unnamed pipes, but you could do the two assignments with a couple of sed in place of the while loops, as in var2="$(sed -ne '/^EOF$/q;p' <&${pipe2})" for variable2 and its respective for variable3, yielding the same result with quite less typing. That is, the whole thing would be:

Lean solution for small amount of data

: {pipe2}<> <(:)
: {pipe3}<> <(:)

command1 | tee >({ command2 ; echo EOF ; } >&${pipe2}) >({ command3 ; echo EOF ; } >&${pipe3}) > /dev/null &
var2="$(sed -ne '/^EOF$/q;p' <&${pipe2})"
var3="$(sed -ne '/^EOF$/q;p' <&${pipe3})"

exec {pipe2}<&- {pipe3}<&-

In order to display the destination variables, remember to disable word splitting by clearing IFS, like this:

IFS=
echo "${var2}"
echo "${var3}"

otherwise you’d lose newlines on output.

The above does look quite a clean solution indeed. Unfortunately it can only work for not-too-much output, and here your mileage may vary: on my tests I hit problems on around 530k of output. If you are within the (well very conservative) limit of 4k you should be all right.

The reason for that limit lies to the fact that two assignments like those, ie command substitution syntax, are synchronous operations, which means that the second assignment runs only after the first is finished, while on the contrary the tee feeds both commands simultaneously and blocking all of them if any happens to fill its receiving buffer. A deadlock.

The solution for this requires a slightly more complex script, in order to empty both buffers simultaneously. To this end, a while loop over the two pipes would come in handy.

A more standard solution for any amount of data

A more standard Bashism is like:

declare -a var2 var3
while read -r line ; do
   case "${line}" in
   cmd2:*) var2+=("${line#cmd2:}") ;;
   cmd3:*) var3+=("${line#cmd3:}") ;;
   esac
done < <(
   command1 | tee >(command2 | stdbuf -oL sed -re 's/^/cmd2:/') >(command3 | stdbuf -oL sed -re 's/^/cmd3:/') > /dev/null
)

Here you multiplex the lines from both commands onto the single standard “stdout” file-descriptor, and then subsequently demultiplex that merged output onto each respective variable.

Note particularly:

  • the use of indexed arrays as destination variables: this is because just appending to a normal variable becomes horribly slow in presence of lots of output
  • the use of sed commands to prepend each output line with the strings "cmd2:" or "cmd3:" (respectively) for the script to know which variable each line belongs to
  • the necessary use of stdbuf -oL to set line-buffering for commands’ output: this is because the two commands here share the same output file-descriptor, and as such they would easily override each other’s output in the most typical race condition if they happen to stream out data at the same time; line-buffering output helps avoiding that
  • note also that such use of stdbuf is only required for the last command of each chain, ie the one outputting directly to the shared file-descriptor, which in this case are the sed commands that prepend each commandX’s output with its distinguishing prefix

One safe way to properly display such indexed arrays can be like this:

for ((i = 0; i < ${#var2[*]}; i++)) ; do
   echo "${var2[$i]}"
done

Of course you can also just use "${var2[*]}" as in:

echo "${var2[*]}"

but that is not very efficient when there are many lines.

LL3
  • 5,418
  • This is very interesting but what makes it better than command1 | tee >(command2 | sed 's/^/cmd2:/') | command3 | sed 's/^/cmd3:/.? – katosh Mar 20 '19 at 09:26
  • I tried to get it to work but failed to capture any output. How do I manage to store it in var2 and var3? – katosh Mar 20 '19 at 11:17
0

I found something that seems to work nicely:

exec 3<> <(:)
var3=$(command1 | tee >(command2 >&3) | command3)
var2=$(while IFS= read -t .01 -r -u 3 line; do printf '%s\n' "$line"; done)

It works by setting an anonymous pipe <(:) to the file-descriptor 3 and piping the output of command2 to it. var3 captures the output of command3 and the last line reads from the file-descriptor 3 until it does not receive any new data for 0.01 seconds.

It only works for an output of up to 65536 bytes of command2 which seems to be buffered by the anonymous pipe.

I do not like the last line of the solution. I would rather read in everything at once and not wait for 0.01 seconds but stop as soon as the buffer is empty. But I do not know any better way.

katosh
  • 369
  • The problem in your last line is that fd 3 does not actually get closed at the end of output, hence the read does not sense the eof event. See also my own updated answer for more info. – LL3 Mar 20 '19 at 20:30
0

This is achievable since 4.0. bash added a shell reserved word coproc. It forks the command that follows to the background as a sub process (which normally would not allow the passing of variables.) however it creates an array (defaulting to COPROC but can be named). ${COPROC[0]} connects to the standard input of the sub-process ${COPROC1} its standard out.These process can be manipulated as jobs, are asynchronous so you can just use tee to pipe to the two coprocess and have them output to a separate file each then combine the outputs on the return value of each by calling "${COPROCFIRST1} ${COPROCSECOND1}" which is great because it doesnt even need to be within the pipeline.

bash coproc command_1 { command1 arg1 arg2 arg3 >> command_1_output.txt } coproc command_2 { command2 arg1 arg2 arg3 >> command_2_output.txt } othercommand | tee >^${command_11}" >&"${command_21}" read -r results1 <&"${command_1[0]" read -r resluts2 <&"${command_2[0]" echo "$result1 $result2" | command3 >> combinedresult.txt

As mentioned this solution is currently incomplete so I'm striking it, however the principle is sound and is similar in to the answer above. Hower i will direct you to some good articles on the subject and will return to this answer when work permits.

Example of setting variables in a coprocess

Indepth look at coprocess and named pipes

Uses and Pitfalls

rayiik
  • 1
  • I find this confusing. (1) Answers are a *WHOLE lot* more readable when they use the same naming convention as the question.  It appears that you have renamed “command1” → “othercommand”, “command2” → “command1”, and “command3” → “command2”.  And yet you also have a “command3”, so clearly I’m not understanding you. (2) The question requests that there be no temporary files, and yet you clearly have some.  ISTM that this exercise is easy if you’re allowed to use temporary files.  … (Cont’d) – G-Man Says 'Reinstate Monica' Mar 24 '22 at 10:17
  • (Cont’d) …  (3) The question wants output into two variables.  Getting back to my point 1, I see no var2 and var3 — have you renamed them to results1 and results2?  Are you assuming that each command will produce exactly one line of output?  (4) Again getting back to my point 1, what is** command3?  What is combinedresult.txt?  The question does not** call for an output file.  Why are you trying to “combine the outputs on the return value”?  … (Cont’d) – G-Man Says 'Reinstate Monica' Mar 24 '22 at 10:17
  • (Cont’d) …  (5) You clearly have not copied-and-pasted this from a tested, working solution, because the third line (the one with the tee) has unbalanced quotes.  I’ll admit that I have rushed to post an answer to a new question in hopes of being the first to do so.  Rushing to post an untested answer to a three-year-old question is poor form. (6) At the risk of nit-picking, why are you using >> instead of >? – G-Man Says 'Reinstate Monica' Mar 24 '22 at 10:17
  • ah, I apologize was written on my break at work on cellphone updating now, I was working under the assumption that the output was meant to be saved (as in logs) however if you look at the command the variables are not set using temporary files but are set by the values of the output of the co process commands called through their array. – rayiik Mar 26 '22 at 04:30
  • However I use the >> instead of the > because co-process runs in a stream it reevaluates every time you feed it information and if you log your command out put (which i generally default too). Which is why i didn't consider them temporarily files as they a. aren't what sets the variable and b. aren't intended to be destroyed. I'm of work in in an hour and will write up better articulated solution. – rayiik Mar 26 '22 at 04:43