6

I have to run a bunch of bash commands asynchronously and as soon as one finishes, I need to perform actions according to its exit code and output. Note that I can't predict how for long any of these tasks will run in my real use case.

To solve this problem, I ended up with the following algorithm:

For each task to be run:
    Run the task asynchronously;
    Append the task to the list of running tasks.
End For.

While there still are tasks in the list of running tasks:
    For each task in the list of running tasks:
        If the task has ended:
            Retrieve the task's exit code and output;
            Remove the task from the list of running tasks.
        End If.
    End For
End While.

This gives me the following bash script:

  1 #!/bin/bash
  2 
  3 # bg.sh
  4 
  5 # Executing commands asynchronously, retrieving their exit codes and outputs upon completion.
  6 
  7 asynch_cmds=
  8 
  9 echo -e "Asynchronous commands:\nPID    FD"
 10 
 11 for i in {1..10}; do
 12         exec {fd}< <(sleep $(( i * 2 )) && echo $RANDOM && exit $i) # Dummy asynchronous task, standard output's stream is redirected to the current shell
 13         asynch_cmds+="$!:$fd " # Append the task's PID and FD to the list of running tasks
 14         
 15         echo "$!        $fd"
 16 done    
 17 
 18 echo -e "\nExit codes and outputs:\nPID       FD      EXIT    OUTPUT"
 19 
 20 while [[ ${#asynch_cmds} -gt 0 ]]; do # While the list of running tasks isn't empty
 21         
 22         for asynch_cmd in $asynch_cmds; do # For each to in thhe list
 23                 
 24                 pid=${asynch_cmd%:*} # Task's PID
 25                 fd=${asynch_cmd#*:} # Task's FD
 26                 
 27                 if ! kill -0 $pid 2>/dev/null; then # If the task ended
 28                         
 29                         wait $pid # Retrieving the task's exit code
 30                         echo -n "$pid   $fd     $?      "
 31                         
 32                         echo "$(cat <&$fd)" # Retrieving the task's output
 33                         
 34                         asynch_cmds=${asynch_cmds/$asynch_cmd /} # Removing the task from the list
 35                 fi
 36         done
 37 done

The output tells me that wait fails trying to retrieve the exit code of each tasks, except the last one to be run:

Asynchronous commands:
PID     FD
4348    10
4349    11
4351    12
4353    13
4355    14
4357    15
4359    16
4361    17
4363    18
4365    19

Exit codes and outputs:
PID     FD  EXIT OUTPUT
./bg.sh: line 29: wait: pid 4348 is not a child of this shell
4348    10  127  16010
./bg.sh: line 29: wait: pid 4349 is not a child of this shell
4349    11  127  8341
./bg.sh: line 29: wait: pid 4351 is not a child of this shell
4351    12  127  13814
./bg.sh: line 29: wait: pid 4353 is not a child of this shell
4353    13  127  3775
./bg.sh: line 29: wait: pid 4355 is not a child of this shell
4355    14  127  2309
./bg.sh: line 29: wait: pid 4357 is not a child of this shell
4357    15  127  32203
./bg.sh: line 29: wait: pid 4359 is not a child of this shell
4359    16  127  5907
./bg.sh: line 29: wait: pid 4361 is not a child of this shell
4361    17  127  31849
./bg.sh: line 29: wait: pid 4363 is not a child of this shell
4363    18  127  28920
4365    19  10   28810

The output of the commands is flawlessly retrieved, but I don't understand where this is not a child of this shell error comes from. I must be doing something wrong, as wait is able to get the exit code of the last command to be run asynchronously.

Does anyone know where this error comes from? Is my solution to this problem flawed, or am I misunderstanding the behavior of bash? I'm having a hard time understand the behavior of wait.

P.S: I posted this question on Super User, but on second thought, it might be better suited to the Unix & Linux Stack Exchange.

Christopher
  • 15,911

2 Answers2

4

This is a bug/limitation; bash only allows to wait for the last process substitution, no matter if you save the value of $! into another variable.

Simpler testcase:

$ cat script
exec 7< <(sleep .2); pid7=$!
exec 8< <(sleep .2); pid8=$!
echo $pid7 $pid8
echo $(pgrep -P $$)
wait $pid7
wait $pid8

$ bash script
6030 6031
6030 6031
/tmp/sho: line 9: wait: pid 6030 is not a child of this shell

Despite pgrep -P actually finding this as a child of the shell, and strace showing that bash is actually reaping it.

But anyways, $! being also set to the PID of the last process substitution is an undocumented feature (which iirc didn't use to work in older versions), and is subject to some gotchas.


This happens because bash only keeps track of the last process substitution in the last_procsub_child variable. This is where wait will look for the pid:

-- jobs.c --
/* Return the pipeline that PID belongs to.  Note that the pipeline
   doesn't have to belong to a job.  Must be called with SIGCHLD blocked.
   If JOBP is non-null, return the index of the job containing PID.  */
static PROCESS *
find_pipeline (pid, alive_only, jobp)
     pid_t pid;
     int alive_only;
     int *jobp;         /* index into jobs list or NO_JOB */
{
     ...
  /* Now look in the last process substitution pipeline, since that sets $! */
  if (last_procsub_child)
    {

but that will be discarded when a new proc subst is created:

-- subst.c --
static char *
process_substitute (string, open_for_read_in_child)
     char *string;
     int open_for_read_in_child;
{
   ...
      if (last_procsub_child)
        discard_last_procsub_child ();
  • They may be child processes of the shell, but process substitutions are not asynchronous jobs in the shell, just like the parts of a pipeline aren't. The fact that you get some value in $! from a process substitution seems like a bug to me. – Kusalananda Sep 13 '19 at 20:32
  • 1. $! refers to the last asynchronous list / command, not job. There are no jobs in a script, unless set -m was used. 2. there were recent fixes in bash for $! to also work with process substs, and this is clear the intention in the source code -- so this is just a bug/limitation. –  Sep 13 '19 at 20:36
  • Sorry for using the wrong word there. POSIX uses "the most recent background command" when describing $!, i.e. anything started with & at the end. I supposed bash is free to expand this to process substitutions, but it should really be documented outside of the source code... So, the user here is using an undocumented non-standard feature. – Kusalananda Sep 13 '19 at 20:41
  • https://lists.gnu.org/archive/html/bug-bash/2015-03/msg00080.html and there may more discussion (I don't monitor the bash list, and the bash git repo is dump only, without proper commit messages). Having the pid of the last process subst in $! and being able to wait for it is clearly intentional and useful (even with the limitations mentioned in my linked answer), and this bug is probably fixable. –  Sep 13 '19 at 21:01
  • The fact that $! will be set to the last process substitution PID is documented. As the bash manual tells us about process substitution: "The process list is run asynchronously", and about the $! parameter: "Expands to the process ID of the job most recently placed into the background, whether executed as an asynchronous command or using the bg builtin". The behavior of $! seems normal to me here, but wait appears to be inconsistent in its ability to grad finished processes exit codes. – Informancien Sep 14 '19 at 08:26
2

This is what I came up with.

First, a dummy run script, which in your case will be something quite different:

#!/bin/bash

sleep $1;
exit $2

Next, a bg script that puts run jobs into the background, with appropriate redirections:

#!/bin/bash

echo $$

( ( touch $$.running; "$@" > $$.out 2>$$.err ; echo $? > $$.exitcode ) & )

Finally, a driver script that controls the whole thing. This is the script that you will actually run, not the other two of course. Comments within should help, but I have tested it and it seems to work fine.

#!/bin/bash

# first run all commands via "bg"
./bg ./run 10 0
./bg ./run 5 5
./bg ./run 2 2
./bg ./run 0 0
# ... and so on

while :
do
    shopt -s nullglob
    for i in *.exitcode
    do
        j=$(basename $i .exitcode)
        # now process $j.out, $j.err, $j.exitcode however you want; most
        # importantly, *move* at least the exitcode file out of this directory
        echo $j had exit code of `cat $i`
        rm $j.*
    done

    shopt -u nullglob
    ls *.running >/dev/null 2>&1 || exit
    sleep 1
done
  • Thanks for your answer. Your script works, but I'm trying to do this without creating temporary files. Also, here are a few tips: You could try to put your ls command instead of : in the while statement, doing so will make your loop end as soon as every task finished. If you do so, put the shopt commands outside your loop to avoid running them at each iteration. You can use substring substitution instead of basename, which is an external command. And instead of using two redirections for your ls command, you can use &>/dev/null to redirect both file descriptors at once. – Informancien Sep 20 '19 at 12:45
  • The ls requires nullglob to be unset, while the for i requires it to be set. The only way to do what you suggested would be to leave nullglob unset, and in the for i ... check that i is literally *.exitcode, which I find extremely ugly. I do agree that basename is inefficient, but the rest of it has no bearing on performance. In particular, setting/unsetting a bash internal option is pretty much zero cost; I don't see the need to worry about it. –  Sep 21 '19 at 00:43