14

I'm trying to write a utility script errpipe with a simple api that runs stderr through a filter. At first I tried to implement it using bash's process substitution feature.

#!/bin/bash

com="$1"
errpipe="$2"

$com 2> >(1>&2 $errpipe)

The problem with this is that the output looks strange when com does not exist.

If I type

sh-3.2$ ./errpipe foo cat

I get

sh-3.2$ ./errpipe foo cat
sh-3.2$ ./errpipe: line 6: foo: command not found
@

with @ representing the cursor. In other words, the shell prompt was printed too early. I suspect this is because the main shell script is not waiting for the process substitution process to complete. Throwing in a wait at the end of the script doesn't seem to fix the problem.

I'm open to a solution that uses bash, ksh, zsh or possibly some crazy awk feature. I think I know how to wire this together using something like C or Perl that exposes a richer API for manipulating processes and file descriptors, but I'd like to avoid using it unless there isn't an alternative.


One solution that "almost works" is to use the fact that $$ is not changed when the shell forks and to lob a signal at the parent when errpipe is finished.

#!/bin/bash

com="$1"
errpipe="$2"

$com 2> >(1>&2 $errpipe; kill -SIGUSR1 $$)

while true; do
    sleep 60
done

This fixes the original problem but a) is ugly and b) prints User defined signal 1: 30 before terminating even if I have a signal handler for SIGUSR1 and c) will loop forever if the process responsible for sending a SIGUSR to the parent dies somehow.

Greg Nisbet
  • 3,076

1 Answers1

18

Yes, in bash like in ksh (where the feature comes from), the processes inside the process substitution are not waited for.

for a <(...) one, that's usually fine as in:

cmd1 <(cmd2)

the shell will be waiting for cmd1 and cmd1 will be typically waiting for cmd2 by virtue of it reading until end-of-file on the pipe that is substituted, and that end-of-file typically happens when cmd2 dies. That's the same reason several shells (not bash) don't bother waiting for cmd2 in cmd2 | cmd1.

For cmd1 >(cmd2), however, that's generally not the case, as it's more cmd2 that typically waits for cmd1 there so will generally exit after.

zsh will wait for cmd2 there (but not if you write it as cmd1 > >(cmd2), use {cmd1} > >(cmd2) instead as documented).

ksh doesn't wait by default, but lets you wait for it with the wait builtin (it also makes the pid available in $!, though that doesn't help if you do cmd1 >(cmd2) >(cmd3))

rc (with the cmd1 >{cmd2} syntax), same as ksh except you can get the pids of all the background processes with $apids.

es (also with cmd1 >{cmd2}) waits for cmd2 like in zsh, and also waits for cmd2 in <{cmd2} process redirections.

bash does make the pid of cmd2 (or more exactly of the subshell as it does run cmd2 in a child process of that subshell even though it's the last command there) available in $!, but doesn't let you wait for it.

If you do have to use bash, you can work around the problem by using a command that will wait for both commands with:

{ { cmd1 >(cmd2); } 3>&1 >&4 4>&- | cat; } 4>&1

That makes both cmd1 and cmd2 have their fd 3 open to a pipe. cat will wait for end-of-file at the other end, so will typically only exit when both cmd1 and cmd2 are dead. And the shell will wait for that cat command. You could see that as a net to catch the termination of all background processes (you can use it for other things started in background like with &, coprocs or even commands that background themselves provided they don't close all their file descriptors like daemons typically do).

Note that thanks to that wasted subshell process mentioned above, it works even if cmd2 closes its fd 3 (commands usually don't do that, but some like sudo or ssh do). Future versions of bash may eventually do the optimisation there like in other shells. Then you'd need something like:

{ { cmd1 >(sudo cmd2; exit); } 3>&1 >&4 4>&- | cat; } 4>&1

To make sure there's still an extra shell process with that fd 3 open waiting for that sudo command.

Note that cat won't read anything (since the processes don't write on their fd 3). It's just there for synchronisation. It will do just one read() system call that will return with nothing at the end.

You can actually avoid running cat by using a command substitution to do the pipe synchronisation:

{ unused=$( { cmd1 >(cmd2); } 3>&1 >&4 4>&-); } 4>&1

This time, it's the shell instead of cat that is reading from the pipe whose other end is open on fd 3 of cmd1 and cmd2. We're using a variable assignment so the exit status of cmd1 is available in $?.

Or you could do the process substitution by hand, and then you could even use your system's sh as that would become standard shell syntax:

{ cmd1 /dev/fd/3 3>&1 >&4 4>&- | cmd2 4>&-; } 4>&1

though note as noted earlier that not all sh implementations would wait for cmd1 after cmd2 has finished (though that's better than the other way round). That time, $? contains the exit status of cmd2; though bash and zsh make cmd1's exit status available in ${PIPESTATUS[0]} and $pipestatus[1] respectively (see also the pipefail option in a few shells so $? can report the failure of pipe components other than the last)

Note that yash has similar issues with its process redirection feature. cmd1 >(cmd2) would be written cmd1 /dev/fd/3 3>(cmd2) there. But cmd2 is not waited for and you can't use wait to wait for it either and its pid is not made available in the $! variable either. You'd use the same work arounds as for bash.

  • You can just do cmd1 > >(cmd2) | cat, just like how cat waited for fd3, cat will wait for fd1. Also, when doing cmd1 > >(cmd2) 3>&1, it's incorrect to say cmd2 will have its fd3 open. It doesn't, only cmd1 will have its fd3 set to fd1. cmd2 is unaffected, so the crazy redirection tree doesn't actually end up touching cmd2 at all, hence why it doesn't do anything. All that matters is piping into cat, and cat will wait for cmd1's file descriptors to close. – Nicholas Pipitone Jul 24 '18 at 17:01
  • @NicholasPipitone But with cmd1 > >(cmd2) | cat, all the output of cmd1 would go through cat! The point here is only to use cat for synchronisation. It otherwise doesn't do anything. I can't find cmd1 > >(cmd2) 3>&1 anywhere in this answer. – Stéphane Chazelas Jul 24 '18 at 18:25
  • For the command { { cmd1 >(cmd2); } 3>&1 >&4 4>&- | cat; } 4>&1, I was referring to the claim That makes both cmd1 and cmd2 have their fd 3 open to a pipe.. Only cmd1 will have fd 3 open. The given solution uses fd 4 to hop over the cat and print it out, and used fd 3 to keep cat alive. That's not necessary, doing cmd1 > >(cmd2) | cat solves the problem at the same level of efficiency, but avoids the complicated 3>&1 >&4 4>&- | cat; } 4>&1. cat's job is to hook input to output, so the fd 4 hack isn't needed. – Nicholas Pipitone Jul 24 '18 at 19:18
  • You used cmd1 >(cmd2) as opposed to cmd1 > >(cmd2), my bad, but | cat still solves it for both. I'm not actually sure how it works though, because if cmd2 is sleep 1 1<&- 2<&-, it still waits until completion, even though sleep 1 has no output pipes. However, if i use date instead of cat, it ends immediately. I'm trying to test it, but I'm supposing that the subshell spawned by the pipe for cmd1 >(cmd2) only closes when cmd1 and cmd2 both exit, which makes sense since cmd2 being a subshell of cmd1 >(cmd2). – Nicholas Pipitone Jul 24 '18 at 20:08
  • @NicholasPipitone, in { cmd1 >(cmd2); } 3>&1, we're redirecting a command group so the redirection applies to all processes started within. Try for instance { echo >(ls -l /proc/self/fd); } 3> /dev/zero. Again cmd1 >(cmd2) | cat makes the whole output of cmd1 (and cmd2 with bash) go through the pipe to cat which is not what we want. In my approach, cat does just one read() system call which reads nothing but doesn't return before both cmd1 and cmd2 terminate which is the whole point of the exercise. – Stéphane Chazelas Jul 24 '18 at 20:40