-3

I am getting a file with content below

902461360       81636718        32863608        0       employee    permenant
902492248       81415224        32775337        0       employee    temporary
902495059       81686374        32881482        0       employee    permenant
902495059       81686374        32881482        0       employee    vendor
902504989       81675052        32877123        0       employee    vendor
902532086       81691300        32884527        0       employee    vendor
902723910       81690082        32882735        0       employee    permenant
902723910       81690082        32882735        0       employee    vendor

The first three values might be repeating in other lines I want to keep one instance and remove other duplicates

the output should be like below

902461360       81636718        32863608        0       employee    permenant
902492248       81415224        32775337        0       employee    temporary
902495059       81686374        32881482        0       employee    permenant
902504989       81675052        32877123        0       employee    vendor
902532086       81691300        32884527        0       employee    vendor
902723910       81690082        32882735        0       employee    permenant

Archemar
  • 31,554
sravani
  • 11
  • 5
    Please replace the images of the data with the actual data (as text), so that popelp are able to test their solutions. Don't post images of text – Kusalananda Sep 02 '20 at 11:54
  • Welcome to the site. How do you define "might be repeating"? Do you want to remove a line if the exact combination of value1, value2 and value2 has already occured, or if any of value1 or value2 or value3 has already occurred in a previous line? – AdminBee Sep 09 '20 at 10:18

1 Answers1

2

I would try

awk '!a[$1 $2 $3]++ { print ;}' file

where

  • !a[$1 $2 $3]++ will evaluate to true first time thoses values are found.

see How does awk '!a[$0]++' work? for more details.

Archemar
  • 31,554
  • 2
    In the general case, it would be safer to use a[$1,$2,$3] (with commas) as that inserts the value of SUBSEP between the values that makes up the key instead of just concatenating. A set of 1, 23, 4 would otherwise be indistinguishable from the set 12, 3, 4. Also, { print; } is not actually needed. – Kusalananda Sep 02 '20 at 12:27
  • thats absolutely solved my issue. Thanks aton – sravani Sep 02 '20 at 13:05
  • 1
    @sravani Consider accepting a post that solves your problem. – Quasímodo Sep 02 '20 at 13:10