Just change your awk command to the column you want to remove duplicated lines based on that column (in your case third column):
awk '!seen[$3]++' filename
This command is telling awk
which lines to print. The variable $3
holds the entire contents of column 3 and square brackets are array access. So, for each third column of line in filename, the node of the array named seen
is incremented and the line printed if the content of that node(column3) was not (!
) previously set. By doing this, always the first lines (unique by the third column) will be kept.
Above will work if your columns in input file are delimited with Spaces/Tabs, if that is something else, you will need to tell it to awk with its -F
option. So, for example if columns delimited with comma(,
) and wants to remove lines base on third column, use the command as following:
awk -F',' '!seen[$3]++' filename
-u
would only remove duplicate lines, not duplicate keys... but I'm wrong. – Randoms Feb 15 '18 at 04:40-k3
would sort using the string starting with the 3rd key rather than sort ONLY using the 3rd key. Tryprintf 'a b\na c\n' | sort -k1,1 -u
vsprintf 'a b\na c\n' | sort -k1 -u
. – Ed Morton Sep 05 '21 at 11:51-s
option to handle that. – Ed Morton Sep 05 '21 at 11:52-t' '
would be useful or even the right thing to do, it'd break if the input was tab-separated for example. – Ed Morton Sep 05 '21 at 11:53