52

I have a text file of this type, and I would look for any lines containing the string Validating Classification and then obtain uniquely the reported errors. I do not know the types of possible errors.

Input file:

201600415 10:40 Error Validating Classification: error1
201600415 10:41 Error Validating Classification: error1
201600415 10:42 Error Validating Classification: error2
201600415 10:43 Error Validating Classification: error3
201600415 10:44 Error Validating Classification: error3

Output file

201600415 10:40 Error Validating Classification: error1
201600415 10:42 Error Validating Classification: error2
201600415 10:43 Error Validating Classification: error3

Can I achieve this using grep, pipes and other commands?

  • 3
    using grep .... | sort --unique – Ahmed Nabil Aug 03 '22 at 07:53
  • I vote for reopening this question. The one marked as duplicate is different because it it is not about grep. In case you are using git, the command git grep -h <pattern> | sort --unique will give unique occurrences of grep matches. – Paul Rougieux Nov 29 '22 at 15:58

3 Answers3

95

You will need to discard the timestamps, but 'grep' and 'sort --unique' together can do it for you.

grep --only-matching 'Validating Classification.*' | sort --unique

So grep -o will only show the parts of the line that match your regex (which is why you need to include the .* to include everything after the "Validating Classification" match). Then once you have just the list of errors, you can use sort -u to get just the unique list of errors.

11

You can use this command assuming your data in in the file test

uniq -f 2 <test
Marko
  • 156
2

I would go with awk

awk -F: '{ if (!a[$3]++ ) print ;}' file
  • -F: use : as separator
  • $3 is pattern after :
  • !a[$3]++ ensure being true only on first occurence
Archemar
  • 31,554