1

I have a JSON like text file of records with duplicate ids that look like:

{"ID":"93" , "ST":[42,77,51]}
{"ID":"21" , "ST":[43,4]}
{"ID":"94" , "ST":[65,11,4]}
{"ID":"93" , "ST":[42,77,51,29,63]}
{"ID":"73" , "ST":[21,20]}
{"ID":"94" , "ST":[65,11,4]}
{"ID":"77" , "ST":[87]}

I am trying to filter the duplicates and always keep the first occurrence of such a match. The field ST could be the same but also different for the records with the same id.

The output would look like:

{"ID":"93" , "ST":[42,77,51]}
{"ID":"21" , "ST":[43,4]}
{"ID":"94" , "ST":[65,11,4]}
{"ID":"73" , "ST":[21,20]}
{"ID":"77" , "ST":[87]}

A similar question has already been asked here, but for this case the data file being edited was a comma separated file. Here we are dealing with a JSON data and the goal will be to find the lines that have the same id values(may be a regex match) and keep the latest one. Anyone has an idea how to tackle that with awk, sed or pure command line tools?

  • I just edited the question that the lines here are not CSV formatted that's why Stephen's answer didn't work. – HarryJason Jun 20 '15 at 20:23
  • Do you want to keep the first occurrence, or the last one? You ask for both in your latest edit ("keep the first occurrence of such a match", then "keep the latest one"). – Stephen Kitt Jun 20 '15 at 21:05

1 Answers1

2

You can use the usual awk de-duplicating technique, on the first field only (fields are separated by spaces):

awk '!count[$1]++'
Stephen Kitt
  • 434,908
  • Thanks for your answer. That doesn't work for the above format. – HarryJason Jun 20 '15 at 20:25
  • 1
    @HarryJason I tried it with your example before posting and it worked for me... Note the $2 rather than $1 in the linked answer. – Stephen Kitt Jun 20 '15 at 20:38
  • exactly that's working for the example provided that the fields are space separated. But that might not always be the case. Can we consider a more general option that takes the regex into account? Thanks in advance. – HarryJason Jun 20 '15 at 20:50
  • 1
    What regex? Could you [edit] your question to explain exactly what you're looking for? If the example you give doesn't correspond to what you're trying to do it's hard to show you something that works... – Stephen Kitt Jun 20 '15 at 20:52