1

simply put, I have a file with lines of text that are unknown to me, something like

abaa
dddd
bbbb
cccc
abaa
aaaa
abaa

the result I'd like to get is

dddd
bbbb
cccc
aaaa

where all the duplicates are completely removed. if one line of text is duplicated anywhere else, remove this line and any repetitions that follow. is this possible to do? all of my searches show output with duplicates removed, leaving one instance, which I would like removed.

dimm0k
  • 43
  • 2
    not quite a dup: that question shows how to print unique lines, not remove all duplicate lines. – glenn jackman Nov 07 '16 at 21:12
  • just to clarify, this means that you want to take a file, and for any line that has duplicates, remove all instances of that line, including the original instance? – HalosGhost Nov 07 '16 at 21:15

1 Answers1

3

this approach takes 2 passes through the file: one to count the number of times each line occurs, one to print the lines that only appear once:

awk 'NR == FNR {count[$0]++; next}; count[$0] == 1' file file
glenn jackman
  • 85,964
  • sorry for my lack of details in my original question, but this is exactly what I was looking for! thanks! – dimm0k Nov 07 '16 at 21:19
  • 1
    If the order of the lines in the output is not a concern, we don't need a 2nd pass through the file: in the END block, we can output the array keys where the value is one. – glenn jackman Nov 07 '16 at 21:20