1

I often need to tail -f apache access logs for websites to troubleshoot issues- one thing that makes it annoying is that anyone loading a page once may cause 12+ lines to get written to the log, and since they're long lines each one wraps multiple lines in my terminal.

tail -f seems to play nicely with piping to grep and awk, and I came up with a pretty simple solution to filter out duplicates when one IP address makes many requests in a particular second (as well as trim it to the particular info I usually need)-

tail -f log.file | awk ' { print $1 " " $4 " " $9}' | uniq

The problem is, this doesn't work. I just get no output at all, even when I know there should be tons of lines printed.

I've tried some troubleshooting, but haven't been able to get things to really work-

tail -f log.file | awk ' { print $1 " " $4 " " $9}' 

This works exactly as I think it should, and prints the lines as they happen (but with many duplicates) like so:

12.34.56.78 [10/May/2016:18:42:01 200
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
tail log.file | awk ' { print $1 " " $4  " " $9}' | uniq

This also works exactly as I think it should, and filters out any duplicate lines. But for my troubleshooting I really need the real time updates of tail -f

How can I make tail -f filter out duplicate lines?

Yex
  • 13
  • 4
  • 3
    Try adding stdbuf, e.g. stdbuf -oL uniq. – Mikel May 10 '16 at 23:58
  • This doesn't work - just no output. Had already tried it before making this post.

    Edit- turns out the stdbuf -oL needs to go before the awk, not the uniq

    – Yex May 11 '16 at 00:14
  • tail -f log.file | stdbuf -oL awk ' { print $1 " " $4 " " $9}' | uniq

    This works exactly as I want things to. The filtering isn't perfect (sometimes you'll get alternating pairs of duplicates, but no double duplicates), but it's good enough.

    – Yex May 11 '16 at 00:21
  • This question is not duplicate of Turn of buffering in pipe although stdbuf can be used here. – Manwe May 26 '16 at 06:28

1 Answers1

2

As a pure awk solution, try:

tail -f log.file | awk ' $0!=last{ print $1 " " $4 " " $9} {last=$0}'

This one prints a new output line only if the input line is different from the previous input line.

As a slight variation, this one prints a new output line only if this output line differs from the previous output line:

tail -f log.file | awk '{$0=$1" "$4" "$9} last!=$0{print} {last=$0}'

Example

Let's try this test file:

$ cat logfile
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 11
1 2 3 4 5 6 7 8 19
1 2 3 4 5 6 7 8 19 12
1 2 3 4 5 6 7 8 19 13
1 2 3 4 5 6 7 8 19
1 2 3 4 5 6 7 8 29

awk filters out the duplicate output lines:

$ cat logfile | awk '{$0=$1" "$4" "$9} last!=$0{print} {last=$0}' 
1 4 9
1 4 19
1 4 29
John1024
  • 74,655
  • This doesn't seem to work, just tested it and it spit out 12 duplicate lines.

    I expect it's because the parts it's not printing aren't exact duplicates.

    – Yex May 11 '16 at 00:10
  • @Yex In that case, try the second version. It should ignore the non-printed part of the input lines. – John1024 May 11 '16 at 00:20