2

I'd like to write a simple script for finding the intersection of multiple files (the common lines among all files), so after reading some here (link) i tried to write a bash script, which unfortunately fails for me. what am i doing wrong?

RES=$(comm -12 ${1}  ${2})

for FILE in ${@:3}
do
    RES=$(comm -12 $FILE  ${RES})
done

Is there is any other suggestion how to implement this perhaps with parallel or xargs?

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232

4 Answers4

2

A function allows for a recursive approach

f() {
     if (($# == 1))
     then
         cat $1;
         return;
     fi
     comm -12 $1 <(f "${@:2}")
}

f file1 file2 file3 file4 file5...
iruvar
  • 16,725
1

When you dereference RES in:

comm $FILE  ${RES}

the content of RES replaces ${RES}. But comm expects a filename as argument, so for instance if $RES contains hello comm tries to open a file named hello.

Instead you could use a temporary file to store the common lines during the process:

tmp=$(mktemp --tmpdir)
tmp2=$(mktemp --tmpdir)
comm -12 ${1}  ${2} >$tmp

for FILE in ${@:3}
do
    comm -12 $FILE  $tmp >$tmp2
    rm $tmp 
    mv $tmp2 $tmp   
done

cat $tmp 
rm $tmp
Erwan
  • 219
1

No parallel nor xargs, nor comm necessary. Try a function

$ intersection() {  sort $@ | uniq -c | sed -n "s/^ *$# //p"; }
$ intersection file[1-3]
line2
line4
RudiC
  • 8,969
0

The problem is that comm wants two files, and $RES is a variable.

But we can cheat and make it look like a file by use of process substitution:

#!/bin/bash

RES=$(comm -12 ${1}  ${2})

for FILE in ${@:3}
do
    RES="$(comm -12 $FILE  <(printf %s "${RES}"))"
done

printf %s "$RES"

You can see this is pretty much the same as your original, but we use a <(...) structure to run a command and use that as a file name.

So if we have these three files:

a:line1
a:line2
a:line3
a:line4
b:line2
b:line4
b:line6
c:line2
c:line4
c:line8

We can compare them:

% ./allcomp a b c
line2
line4