3

I have some dictionary to myspell in file.dic. Let's say:

abc
aword
bword
cab
worda
wordzzz

and I'm looking for different words that are permutations (or anagrams) of each other.

If there was a command "letter-sort" I'd do it more or less like that:

cat file.dic | letter-sort | paste - file.dic | sort

That gives me:

abc abc
abc cab
adorw aword
adorw worda
bdorw bword    
dorwzzz wordzzz

so now I clearly see anagrams in file. Is there such letters-sort command or how to obtain such result in maybe some other way?

sZpak
  • 481

4 Answers4

1

To sort letters line by line in a file, you could do something like that:

while read line; do
    grep -o . <<< "${line}" | sort | tr -d '\n'
    echo
done < file.dic

Output:

abc
adorw
bdorw
abc
adorw
dorwzzz
pfnuesel
  • 5,837
1

You can use the fold command to break a string into an array of individual characters, like the script below

#!/bin/bash

CHARS=`echo $1 | fold -w1`
# $CHARS now contain an array of single character in the string $1

for i in "${CHARS[@]}"
do
    # do something with each character
    echo $i;
done

Assuming that you have saved the script above as test.sh you can run it as follows:

$./test.sh abcde

and it will break the string "abcde" into a characters array, which then you can use to find its anagrams.

jsalatas
  • 389
  • I've tested it and fold is much faster than grep, unfortunately, it doesn't work in non-latin locale. – sZpak Nov 18 '16 at 13:45
  • Hmmmm..... it works for me (just tested with greek input) $ echo δοκιμή | fold -w1 δ ο κ ι μ ή – jsalatas Nov 18 '16 at 18:54
  • Doesnt work with Polish though :/ echo "zażółć gęślą jaźń" | fold -w1 at least on my system. – sZpak Nov 21 '16 at 10:12
  • Indeed! And I just figured out why: It seems that if all characters in the string are unicode non-ascii (like the greek) it correctly uses two bytes for each character (unicode representation). In the case of polish it fails to do so, as it interpret correctly each character's length (either one or two) and interprets everything as one byte character. Same happens if I mix greek with english like this echo δοκιμή test | fold -w1 :\ – jsalatas Nov 21 '16 at 10:33
1

You mentioned python, just stick with python. Two words are anagrams of each other if 1. they contain the same letters and 2. letter frequencies match. The built-in Counter class can be used to do one-pass letter frequencies without the need for sorting

from __future__ import print_function
from collections import Counter, defaultdict
from itertools import combinations_with_replacement
with open('file') as f:
    data = (l.rstrip('\n') for l in f)
    data = ((l, Counter(l)) for l in data)
    perms = defaultdict(list)
    for l, c in data:
        perms[frozenset(c.iteritems())].append(l)   
    for anagrams in perms.itervalues():
        print(*anagrams)

bword
aword worda
abc cab
wordzzz
iruvar
  • 16,725
0

Perl with it's command line flags can be very good at being succinct:

The following command sorts letters in a word

perl -CS -ne 'chomp; print(join("", sort(split("", $_ . "\n"))))' 

In practice, if you are working with anagrams you might prefer to use the an utility. This can take a dictionary as an argument:

an -d /usr/share/dict/ngerman Anagramword
Att Righ
  • 1,196