2

I am trying to pass standard input into multiple commands and compare their outputs. My current attempt seems close, but doesn't quite work - plus it relies on temporary files which I feel would not be necessary.

An example of what I would want my script to do:

$ echo '
> Line 1
> Line B
> Line iii' | ./myscript.sh 'sed s/B/b/g' 'sed s/iii/III/' 'cat'
1:Line B     2:Line b
1:Line iii   3:Line III

So far I have this:

i=0
SOURCES=()
TARGETS=()

for c in "$@"; do
    SOURCES+=(">($c > tmp-$i)")
    TARGETS+=("tmp-$i")
    i=$((i+1))
done

eval tee ${SOURCES[@]} >/dev/null <&0
comm ${TARGETS[@]}

The issues are:

  • There seems to be a race condition. By the end of execution comm tmp-0 tmp-1 has the desired output (more-or-less) but when executed from the script the output seems non-deterministic.
  • This is limited to just 2 inputs, but I need at least 3 (ideally any number)
  • This creates temporary files that I would have to keep track of and delete afterwards, an ideal solution would only use redirection

The constraints are:

  • The input may not be ending. In particular the input could be something like /dev/zero or /dev/urandom, so merely copying the input to a file won't work.
  • The commands may have spaces in them and be fairly complicated themselves
  • I want a line-by-line, in-order comparison.

Any idea how I could go about implementing this? I basically want something like echo $input | tee >(A >?) >(B >?) >(C >?) ?(compare-all-files) if only such a syntax existed.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • Note: one alternative that is I would say equally close to an answer is just to replace comm with vim -d in the last line. This seems to fix the 'race condition' but fails if the input never ends (vim gets into a weird state) and only takes up to 4 files (without recompiling vim) – LambdaBeta Oct 03 '18 at 19:54
  • Also, does the command have to be written in shell, or would another scripting language (e.g., Perl or Python) be OK? Also, if it does need to be shell, which shell? Bash? – derobert Oct 03 '18 at 20:11
  • It does not have to be a shell. I just used a shell because on the face of it, it seemed like a very shell-compatible task (process substitute tee to get input, process substitute commands to get output) the issue is that I lose the fd's created by the process substitutions, so I can't diff against them. – LambdaBeta Oct 03 '18 at 20:13
  • Sorry but your constraints seem to be mutually exclusive: either your input is infinite or you can have a line-by-line output. I would not know of any CLI that has multiple parallel stdouts. Alternatively, you could consider buffering your output and process it in finite bits (e.g. saving to a .temporary file). – FelixJN Oct 03 '18 at 20:37
  • Why should it be mutually exclusive. I don't need to sort the output, I just need to know which of the lines differ. By infinite input I mean that the script should act like tail -f in that it constantly prints whichever lines get transformed differently by the different programs. I don't mean having infinite input programs, those will always be easily enumerated (in fact I can't see a reason to have more than 5) – LambdaBeta Oct 03 '18 at 20:40
  • 1
    I see, you don't want the output to be printed in parallel but rather pick the odd one out (and just by which line, not the actual output). Misunderstood you there. – FelixJN Oct 03 '18 at 20:58

3 Answers3

2

Since the accepted answer is using perl, you can just as well do the whole thing in perl, without other non-standard tools and non-standard shell features, and without loading unpredictably long chunks of data in the memory, or other such horrible misfeatures.

The ytee script from the end of this answer, when used in this manner:

ytee command filter1 filter2 filter3 ...

will work just like

command <(filter1) <(filter2) <(filter3) ...

with its standard input piped to filter1, filter2, filter3, ... in parallel, as if it were with

tee >(filter1) >(filter2) >(filter3) ...

Example:

echo 'Line 1
Line B
Line iii' | ytee 'paste' 'sed s/B/b/g | nl' 'sed s/iii/III/ | nl'
     1  Line 1       1  Line 1
     2  Line b       2  Line B
     3  Line iii             3  Line III

This is also an answer for the two very similar questions: here and here.

ytee:

#! /usr/bin/perl
#   usage: ytee [-r irs] { command | - } [filter ..]
use strict;
if($ARGV[0] =~ /^-r(.+)?/){ shift; $/ = eval($1 // shift); die $@ if $@ }
elsif(! -t STDIN){ $/ = \0x8000 }
my $cmd = shift;
my @cl;
for(@ARGV){
    use IPC::Open2;
    my $pid = open2 my $from, my $to, $_;
    push @cl, [$from, $to, $pid];
}
defined(my $pid = fork) or die "fork: $!";
if($pid){
    delete $$_[0] for @cl;
    $SIG{PIPE} = 'IGNORE';
    my ($s, $n);
    while(<STDIN>){
        for my $c (@cl){
            next unless exists $$c[1];
            syswrite($$c[1], $_) ? $n++ : delete $$c[1]
        }
        last unless $n;
    }
    delete $$_[1] for @cl;
    while((my $p = wait) > 0){ $s += !!$? << ($p != $pid) }
    exit $s;
}
delete $$_[1] for @cl;
if($cmd eq '-'){
    my $n; do {
        $n = 0; for my $c (@cl){
            next unless exists $$c[0];
            if(my $d = readline $$c[0]){ print $d; $n++ }
            else{ delete $$c[0] }
        }
    } while $n;
}else{
    exec join ' ', $cmd, map {
        use Fcntl;
        fcntl $$_[0], F_SETFD, fcntl($$_[0], F_GETFD, 0) & ~FD_CLOEXEC;
        '/dev/fd/'.fileno $$_[0]
    } @cl;
    die "exec $cmd: $!";
}

notes:

  1. code like delete $$_[1] for @cl will not only remove the file handles from the array, but will also close them immediately, because there's no other reference pointing to them; this is different from (properly) garbage collected languages like javascript.

  2. the exit status of ytee will reflect the exit statuses of the command and filters; this could be changed/simplified.

  • 1
    Excellent utility! If you have the time some comments for those who aren't as familiar with perl would be useful. – LambdaBeta Oct 10 '18 at 14:46
1

This will fail if the lines are longer than RAM size.

#!/bin/bash

commands=('sed s/8/b/g' 'sed s/7/III/' cat)

parallel 'rm -f fifo-{#};mkfifo fifo-{#}' ::: "${commands[@]}" 

cat input |
  parallel -j0 --tee --pipe 'eval {} > fifo-{#}' ::: "${commands[@]}" &

perl -e 'for(@ARGV){ open($in{$_},"<",$_) }
  do{
    @in = map { $f=$in{$_}; scalar <$f> } @ARGV;
    print grep { $in[0] ne $_ } @in;
  } while (not grep { eof($in{$_}) } @ARGV)' fifo-*
Ole Tange
  • 35,514
  • looks close... I don't know too much about perl though. Is there any way to have a variable number of 'input commands'? will adding "$@" after the ::: in parallel and tossing it all in a bash script work as intended? – LambdaBeta Oct 03 '18 at 20:51
  • Try now with commands in an array – Ole Tange Oct 03 '18 at 20:58
  • This seems to be what I want. My installation of parallel doesn't have the tee option, but I can understand the code enough to work around it. – LambdaBeta Oct 03 '18 at 21:05
1

This is simpler:

#!bash
if [[ -t 0 ]]; then
    echo "Error: you must pipe data into this script"
    exit 1
fi
input=$(cat)
commands=$( "$@" )
outputs=()

for cmd in "${commands[@]}"; do
    echo "calling: $cmd"
    outputs+=( "$( $cmd <<<"$input" )" )
done

# now, do stuff with "${outputs[0]}", "${outputs[1]}", etc

This is untested. The outputs+=... line is particularly fragile: see http://mywiki.wooledge.org/BashFAQ/050

glenn jackman
  • 85,964
  • Not necessary to store the commands in an array, but it seemed to fit... – glenn jackman Oct 03 '18 at 20:58
  • "The input may not be ending." Try your solution on input that is bigger than RAM. I am not sure it will work. – Ole Tange Oct 03 '18 at 21:03
  • Yeah, something like this was my first attempt, then I tried wrapping it in a loop reading input one line at a time, either way it was not ideal. It does a good job of expressing the general idea though. – LambdaBeta Oct 03 '18 at 21:04