Keeping unique rows based on information from 2 of three columns

Question

Suppose you have a file like this:

NW_006521251.1  428 84134
NW_006521251.1  511 84135
NW_006521038.1  202 84155
NW_006521038.1  1743 84153
NW_006521038.1  1743 84154
NW_006520495.1  198 84159
NW_006520086.1  473 84178
NW_006520086.1  511 84180

I want to keep the unique rows based on columns 1 and 2 (i.e. not just column two as this number may repeat under a different label in column one).

Such that I get this as output (removes the second repeat of NW_006521038.1 1743 from the list):

    NW_006521251.1  428 84134
    NW_006521251.1  511 84135
    NW_006521038.1  202 84155
    NW_006521038.1  1743 84153
    NW_006520495.1  198 84159
    NW_006520086.1  473 84178
    NW_006520086.1  511 84180

Is there a way to do this with awk? Using uniq file doesn't work.

score 20 · Accepted Answer · answered Apr 23 '18 at 17:28

20

There is a "famous" awk idiom for exactly this. You want to do:

awk '!seen[$1,$2]++' file

That creates an associative array "seen" with the 2 columns as the key. Use the post-increment operator so that, for the first time you encounter that key, the value is zero. The use the negation operator for a "true" result the first time you see the key.

answered Apr 23 '18 at 17:28

glenn jackman

85,964

1

I can't believe how short this awk script is. That's an impressive use of the conditional expression. – JoL Apr 24 '18 at 00:07
@glenn-jackman Just curious if we need to ignore case-sensitive lines and keep whichever comes first. What should we include in it? IGNORECASE=1 doesn't seem to be working with this. TIA. – Raghavendra Gupta Mar 10 '22 at 10:52
Use tolower() or toupper() – glenn jackman Mar 10 '22 at 11:52

score 7 · Answer 2 · answered Apr 23 '18 at 22:35

7

If you don't mind that the output is sorted:

sort -u -k1,2 file

-u - unique
-k1,2 - use fields 1 and 2 together as the key

answered Apr 23 '18 at 22:35

Dennis Williamson

6,680

Keeping unique rows based on information from 2 of three columns

2 Answers2

Linked