awk delete duplicate

Asked Dec 11 '18 at 20:00

Active Dec 11 '18 at 20:06

Viewed 1,966 times

I want to determine if a record contains a duplicate value for a specific field and then delete the duplicate record and save the new file.

abc|123|def|456
abc|456|ghi|789
def|123|def|456

I want to save a new file with any record that duplicates field 1 removed.

abc|123|def|456
def|123|def|456

This awk code comes close, but actually does the opposite. It creates a new duplicate row and then saves it to the new file.

awk -F'|' 'myv=a[$1] !/^myv++/' file.txt > newFile.txt

edited Dec 11 '18 at 20:06

asked Dec 11 '18 at 20:00

user3439308

1

awk -F'|' '!a[$1]++' file.txt > newFile.txt – steeldriver Dec 11 '18 at 20:04
oops, this is the code that comes close -- awk -F'|' 'myv=a[$1]++ !/^myv/' file.txt > newFile.txt ------ with the ++ to catch duplicates. but I do not want to then add the duplicate lines, i want to remove them. – user3439308 Dec 11 '18 at 20:04
SOLVED - thank you steeldriver -- why did you submit that as a comment when it is the answer? I want to give you credit. – user3439308 Dec 11 '18 at 20:07
Thanks - I posted it as a comment because I'm convinced it must be a duplicate of a previous question (although I can't quite find it) – steeldriver Dec 11 '18 at 20:36
@StephenKitt That one is slightly different; it doesn't print the whole line. – Sparhawk Dec 11 '18 at 21:10
1

@Sparhawk I reckoned the important part was the array incrementing technique; but your suggestion is better, thanks! – Stephen Kitt Dec 11 '18 at 21:11

0 Answers0