2

My file is like this:

alice, bob
bob, cat
cat, dennis
cat, bob
dennis, alice

I want to remove lines where same words have been repeated in reverse order. In this example, bob, cat and cat, bob are repeated, so cat bob should be removed and my output should be

alice, bob
bob, cat
cat, dennis
dennis, alice

How can I do this?

  • Any restrictions regarding the other lines? I.e. can the fields be resorted and the lines be resorted, too? – FelixJN Aug 04 '19 at 16:18
  • no such restrictions. sorting can be done any number of times.. –  Aug 04 '19 at 16:28

3 Answers3

3

You could use a hash that is keyed on the sorted elements:

$ perl -lne 'print unless $h{join ",", sort split /, /, $_}++' file
alice, bob
bob, cat
cat, dennis
dennis, alice

For exactly 2 fields, something like this might sufficce

$ awk -F', ' '!seen[$2 FS $1]; {seen[$0]++}' file
alice, bob
bob, cat
cat, dennis
dennis, alice
steeldriver
  • 81,074
  • idk what the perl script does but that awk script will use a lot more memory than necessary, see https://unix.stackexchange.com/a/533876/133219 for the idiomatic awk approach. – Ed Morton Aug 04 '19 at 22:45
1

The idiomatic awk answer:

$ awk -F', ' '!seen[$1>$2 ? $1 FS $2 : $2 FS $1]++' file
alice, bob
bob, cat
cat, dennis
dennis, alice

The general approach for any number of fields is to sort them and use the sorted list as the index to seen[].

Ed Morton
  • 31,617
  • 1
    Can you please explain how logic? – Death Metal Aug 08 '19 at 20:28
  • 1
    @DeathMetal It creates a common index out of each pair of key fields by putting them in greatest-first order so A B and B A both become the index B A. Then it just tests to see if the given index has been seen before - first time either A B or B A is encountered in the input seen["B A"]++ is 0, 2nd time it's 1, and so on. The ! at the front ensures that the default action of printing the current input line only occurs when seen["B A"]++ is zero, i.e. the first time its seen in the input. – Ed Morton Aug 08 '19 at 20:52
-1

This sorts every line by its fields, then the file and pick unique lines only

while read line
  do
    echo $line |
    tr ' ,' '\n' |
    sort |
    tr '\n' ','
done < 1 |
sed -e 's/^,//' -e 's/,$//' -e 's/,,/\n/g' |
sort -u
FelixJN
  • 13,566