Using Raku (formerly known as Perl_6)
~$ raku -e 'my %h; for dir(test => / file \w /) { \
%h{$_}++ for .lines.unique }; \
.put if .value == 4 for %h;'
#OR
~$ raku -e 'my %h; my @a = dir(test => / file \w /);
for @a { %h{$_}++ for .lines.unique };
for %h { .put if .value == @a.elems };'
Above are answers written in Raku, a member of the Perl-family of programming languages. Briefly, Raku's dir
function is used to inspect the local directory and pull out regex matches to filenames. Here we assume the files are named file
followed by \w
a word character, but it's equally easy to (let's say) match \.txt
files with the appropriate regex (not globbing pattern).
For both answers above a %h
hash is declared. Once filenames are obtained (actually .IO
objects), they are iterated through line
wise using for
, incrementing the hash key every time an identical line is seen.
In the first answer (last statement) the lines are returned if their value
matches 4
, i.e. from all four input files. In the second answer (last statement) the lines are returned if their value
matches @a.elems
, i.e. the number of input files (in other words--this value is set dynamically).
Sample Input (Note: files are named fileA
, fileB
, etc. Also, fileA
has an extra line at the end, compared to OP's first file):
fileA
>TCONS_00000867
>TCONS_00001442
>TCONS_00001447
>TCONS_00001528
>TCONS_00001529
>TCONS_00001668
>TCONS_00001921
>TCONS_00001922
fileB
>TCONS_00001528
>TCONS_00001529
>TCONS_00001668
>TCONS_00001921
>TCONS_00001922
>TCONS_00001924
fileC
>TCONS_00001529
>TCONS_00001668
>TCONS_00001921
>TCONS_00001922
>TCONS_00001924
>TCONS_00001956
>TCONS_00002048
fileD
>TCONS_00001922
>TCONS_00001924
>TCONS_00001956
>TCONS_00002048
Sample Output (showing all key/value counts):
~$ raku -e 'my %h; for dir(test => / file \w /) { %h{$_}++ for .lines.unique }; .say for %h.sort: -*.value;'
>TCONS_00001922 => 4
>TCONS_00001668 => 3
>TCONS_00001921 => 3
>TCONS_00001924 => 3
>TCONS_00001529 => 3
>TCONS_00002048 => 2
>TCONS_00001528 => 2
>TCONS_00001956 => 2
>TCONS_00000867 => 1
>TCONS_00001442 => 1
>TCONS_00001447 => 1
Sample Output (showing only lines where .value == 4
or .value == @a.elems
):
~$ raku -e 'my %h; my @a = dir(test => / file \w /); for @a { %h{$_}++ for .lines.unique }; for %h { .key.put if .value == @a.elems};'
>TCONS_00001922
Finally, for those who prefer shell-globbing, Raku can do this as well. The key is remembering that the $*ARGFILES
dynamic variable must be converted to .handles
to read the input properly:
~$ raku -e 'my ($n,%h); for $*ARGFILES.handles -> $fh { $n++; %h{$_}++ for $fh.lines.unique }; for %h { .key.put if .value == $n };' file?
>TCONS_00001922
NOTE: The OP's test input seems like a trick question because no lines are in common between all four files! Thus the first file (fileA
) has been modified to provide a positive control: >TCONS_00001922
.
https://stackoverflow.com/a/68774047/7270649
https://docs.raku.org/routine/dir
https://docs.raku.org
https://raku.org
sed
, this is quite good for finding duplicate lines across many files:cat
tosort
touniq -c
. Somehow I didn't quite think of this, good answer! – smaslennikov May 21 '19 at 21:35uniq -cd
– mems Sep 30 '19 at 14:44