How to sort or uniq a live feed

Question

I'm looking to sort and isolate IP from a tcpdump live feed.

tcpdump -n -i tun0 "tcp[tcpflags] & (tcp-syn) != 0" | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}

works just fine but when I try to add the uniqprogram it fails:

tcpdump -n -i tun0 "tcp[tcpflags] & (tcp-syn) != 0" | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" |  uniq -u

returns nothing.

Same with sort -u.

Any idea on how to fix this ?

my first tought is that uniq and sort use some kind of buffering, but this isn't specify in man page. now, try with | awk '!a[$0]++' (see this posthttp://unix.stackexchange.com/questions/159695/how-does-awk-a0-work ) — Archemar, Jul 08 '16 at 10:41
Would you (theoretically) settle for uniq'd chunks of some size? — Jeff Schaller, Jul 08 '16 at 10:51
@JeffSchaller basically all I want is it not to print ip that have previously been printed in the live feed — ChiseledAbs, Jul 08 '16 at 11:43

score 1 · Accepted Answer · answered Jul 08 '16 at 22:05

You are running up against a theoretical problem. sort cannot not print anything at all until it has processed all the input. uniq will only squeeze repeated lines (which is why it is so often preceded by sort), so your output will differ from your input only if the input has the same line twice in a row. If your input is just a little random you probably won't have noticed a difference.

Your best bet is a simple perl program that reads the input line by line, and checks if it has already been seen. If not, then it prints the input and adds it to the hash table of already seen inputs.

#!/usr/bin/perl
my %LINES ;

while (<STDIN>) {

    if (! $LINES{$_}) {
        $LINES{$_} = 1 ;
        print $_ ;
    }
}

Of course, your list of already-seen lines will grow, so so will the memory taken by your program.

I'm not sure what you'd use this for, but I think I'd add the current date to the print, and maybe to the hash so one could remove inputs after n hours.

How to sort or uniq a live feed

1 Answers1