3

I have multiple files, let's say file1, file2 etc. Each file has one word in each line, like:

file1 file2 file3
one   four  six
two   five
three

What I want is to combine them in a new file4 in every possible permutation (without repetition) in pairs. Like

onetwo
onethree
onefour
onefive
...
twothree
...
onefour
...
fourone
...

How is this possible using Linux commands?

agc
  • 7,223
mpla_mpla
  • 143

5 Answers5

2

ruby is a nice concise language for this kind of stuff

ruby -e '
  words = ARGV.collect {|fname| File.readlines(fname)}.flatten.map(&:chomp)
  words.combination(2).each {|pair| puts pair.join("")}
' file[123] > file4
onetwo
onethree
onefour
onefive
onesix
twothree
twofour
twofive
twosix
threefour
threefive
threesix
fourfive
foursix
fivesix

You're quite right, combination provides "onetwo" but misses "twoone". Good thing there's permutation

ruby -e '
  words = ARGV.collect {|fname| File.readlines(fname)}.flatten.map(&:chomp)
  words.permutation(2).each {|pair| puts pair.join("")}
' file{1,2,3}
onetwo
onethree
onefour
onefive
onesix
twoone
twothree
twofour
twofive
twosix
threeone
threetwo
threefour
threefive
threesix
fourone
fourtwo
fourthree
fourfive
foursix
fiveone
fivetwo
fivethree
fivefour
fivesix
sixone
sixtwo
sixthree
sixfour
sixfive
glenn jackman
  • 85,964
1

Assuming the total size of the input files is smaller than getconf ARG_MAX, (i.e. the maximum command line length), then this should work:

set -- $( cat file[123] )
for f in $@ ; do
    for g in $@ ; do
        [ "$f" != "$g" ] && echo $f$g
    done
done > file4

cat file4 outputs:

onetwo
onethree
onefour
onefive
onesix
twoone
twothree
twofour
twofive
twosix
threeone
threetwo
threefour
threefive
threesix
fourone
fourtwo
fourthree
fourfive
foursix
fiveone
fivetwo
fivethree
fivefour
fivesix
sixone
sixtwo
sixthree
sixfour
sixfive

(As per OP clarification, the above is a revision for permutations without repetition. See previous draft for combinations without repetition.)

agc
  • 7,223
1

A python solution:

import fileinput
from itertools import permutations
from contextlib import closing
with closing(fileinput.input(['file1', 'file2', 'file3'])) as f:
    for x, y in permutations(f, 2):
            print '{}{}'.format(x.rstrip('\n'), y.rstrip('\n'))

onetwo
onethree
onefour
onefive
onesix
twoone
twothree
twofour
twofive
twosix
threeone
threetwo
threefour
threefive
threesix
fourone
fourtwo
fourthree
fourfive
foursix
fiveone
fivetwo
fivethree
fivefour
fivesix
sixone
sixtwo
sixthree
sixfour
sixfive
iruvar
  • 16,725
  • @ iruvar this is much faster than the bash solution similar to @agc I was using. – badner Jul 24 '17 at 15:29
  • @badner - nice - and the speed doesn't surprise me at all given that python file I/O and itertools are implemented in the C layer – iruvar Jul 24 '17 at 17:20
0

Use this:

cat FILE1 FILE2 FILE3 | \
    perl -lne 'BEGIN{@a}{push @a,$_}END{foreach $x(@a){foreach $y(@a){print $x.$y}}}'

Output:

oneone
onetwo
onethree
onefour
onefive
onesix
oneseven
twoone
twotwo
twothree
twofour
twofive
twosix
twoseven
threeone
threetwo
threethree
threefour
threefive
threesix
threeseven
fourone
fourtwo
fourthree
fourfour
fourfive
foursix
fourseven
fiveone
fivetwo
fivethree
fivefour
fivefive
fivesix
fiveseven
sixone
sixtwo
sixthree
sixfour
sixfive
sixsix
sixseven
sevenone
seventwo
seventhree
sevenfour
sevenfive
sevensix
sevenseven
agc
  • 7,223
Baba
  • 3,279
0

TXR Lisp:

Warmup: just get the data structure first:

$ txr -p '(comb (get-lines (open-files *args*)) 2)' file1 file2 file3
(("one" "two") ("one" "three") ("one" "four") ("one" "five") ("one" "six")
 ("two" "three") ("two" "four") ("two" "five") ("two" "six") ("three" "four")
 ("three" "five") ("three" "six") ("four" "five") ("four" "six")
 ("five" "six"))

Now just a matter of getting the right output format. If we catenate the pairs together and then use tprint (implicitly via the -t option), we are there.

First, the catenation via mapping through cat-str:

$ txr -p '[mapcar cat-str (comb (get-lines (open-files *args*)) 2)]' file1 file2 file3
("onetwo" "onethree" "onefour" "onefive" "onesix" "twothree" "twofour"
 "twofive" "twosix" "threefour" "threefive" "threesix" "fourfive"
 "foursix" "fivesix")

OK, we have the right data. Now just use tprint function (-t) instead of prinl (-p):

$ txr -t '[mapcar cat-str (comb (get-lines (open-files *args*)) 2)]' file1 file2 file3
onetwo
onethree
onefour
onefive
onesix
twothree
twofour
twofive
twosix
threefour
threefive
threesix
fourfive
foursix
fivesix

Finally, we read the question again and do permutations instead of combinations with perm rather than comb, as required:

$ txr -t '[mapcar cat-str (perm (get-lines (open-files *args*)) 2)]' file1 file2 file3
onetwo
onethree
onefour
onefive
onesix
twoone
twothree
twofour
twofive
twosix
threeone
threetwo
threefour
threefive
threesix
fourone
fourtwo
fourthree
fourfive
foursix
fiveone
fivetwo
fivethree
fivefour
fivesix
sixone
sixtwo
sixthree
sixfour
sixfive
Kaz
  • 8,273