9

This question/answer has some good solutions for deleting identical lines in a file, but won't work in my case since the otherwise duplicate lines have a timestamp.

Is it possible to tell awk to ignore the first 26 characters of a line in determining duplicates?

Example:

[Fri Oct 31 20:27:05 2014] The Brown Cow Jumped Over The Moon
[Fri Oct 31 20:27:10 2014] The Brown Cow Jumped Over The Moon
[Fri Oct 31 20:27:13 2014] The Brown Cow Jumped Over The Moon
[Fri Oct 31 20:27:16 2014] The Brown Cow Jumped Over The Moon
[Fri Oct 31 20:27:21 2014] The Brown Cow Jumped Over The Moon
[Fri Oct 31 20:27:22 2014] The Brown Cow Jumped Over The Moon
[Fri Oct 31 20:27:23 2014] The Brown Cow Jumped Over The Moon
[Fri Oct 31 20:27:24 2014] The Brown Cow Jumped Over The Moon

Would become

[Fri Oct 31 20:27:24 2014] The Brown Cow Jumped Over The Moon

(keeping the most recent timestamp)

a coder
  • 3,253
  • 4
    Yes. If you were to post some example input and output, then this might amount to a question. – jasonwryan Nov 03 '14 at 16:21
  • 3
    When asking this type of question, you need to include your input and your desired output. We can't help if we have to guess. – terdon Nov 03 '14 at 16:24
  • 1
    "yes" or "no" seems to be an acceptable answer, what are you going to do with that knowledge? In case of no, extend awk? – Anthon Nov 03 '14 at 16:32
  • 1
    Wow. 80,000 rep claim this was an unusable question (I would not call it a good one) but not a single close vote? – Hauke Laging Nov 03 '14 at 16:45
  • 6
    @HaukeLaging it seems reasonable to give the OP the chance to react to our comments. They have now done so and the question is greatly improved. – terdon Nov 03 '14 at 17:39

5 Answers5

15

You can just use uniq with its -f option:

uniq -f 4 input.txt

From man uniq:

  -f, --skip-fields=N
       avoid comparing the first N fields

Actually this will display the first line:

[Fri Oct 31 20:27:05 2014] The Brown Cow Jumped Over The Moon

If that is a problem you can do:

tac input.txt | uniq -f 4

or if you don't have tac but your tail supports -r:

tail -r input.txt | uniq -f 4
Whymarrh
  • 175
Anthon
  • 79,293
4
awk '!seen[substr($0,27)]++' file
Hauke Laging
  • 90,279
  • This solution does not cover the timestamp part as that was not part of the question when this answer was written. – Hauke Laging Nov 03 '14 at 17:18
  • 2
    This is exactly why many of us work to close these until the Q's have been fully fleshed out. Otherwise these Q's are wasting your time and the OP's. – slm Nov 03 '14 at 18:30
3

Try this one:

awk -F ']' '{a[$2]=$1}END{for(i in a){print a[i]"]"i}}'
jimmij
  • 47,140
0

A perl solution:

perl -F']' -anle '$h{$F[1]} = $_; END{print $h{$_} for keys %h}' file
cuonglm
  • 153,898
0

One can use power of vim:

:g/part of duplicate string/d

Very easy. If you have couple more files (such as gzipped rotated logs), vim will open them without any preliminary uncompression on your side and you can repeat the last command by pressing : and . Just like repeating last command in terminal.