0

Given input:

144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603

How can I output:

144.252.36.69
afrloop=32235330165603 3 times
222.252.36.69
afrloop=31135330165603 4 times
Wildcard
  • 36,499

2 Answers2

3
paste - - < file | sort | uniq -c
n.caillou
  • 393
  • 2
  • 7
  • Nice! Took me a little while to understand what paste does in this case :-) – NickD Dec 14 '17 at 03:27
  • @NickD yes, not very easy to understand. Paste takes multiple files as input and each of them can be prepaced with - which is signal to read from stdin. So here we read from stdin twice. – Martin Mucha Jan 30 '21 at 09:21
3

Here is a solution with awk if you want a customized output format

NR%2==1 {ip=$0; next}
NR%2==0 {a[ip"\n"$0]++}
END {
    for(i in a)
        printf "%s %d times\n", i, a[i]
}

the script can be executed as

awk -f main.awk file

Explanation

  • First, we use NR%2==1 to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line $0 into a variable called ip. We can use next to skip any further processing and go straight to the next iteration.

  • Second, we use NR%2==0 to match even number lines, if a line matches then we create an index labeled as ip"\n"$0 in an array a and increment the count value of that specific index. For example, an equivalent expansion would be like

    a["144.252.36.69 afrloop=32235330165603"] += 1
    

    I ignored the new line \n in this example just for simplicity

  • Finally at END, after each line has been processed, we use a for loop to print out the value of each element inside array a which in our case is the count number for each unique index

Fun Benchmark

  • Test file generation (1 million records)

    awk '
        BEGIN{for(i=1;i<10000000;i++)
        printf "%d\nafrLoop=%d\n", int(rand()*100), int(rand()*10)}
    ' > test
    
    $ head test
    23
    afrLoop=2
    84
    afrLoop=1
    58
    
  • @n.caillou paste solution

    $ time paste - - < test | sort | uniq -c > /dev/null
    real    0m11.250s
    user    0m11.352s
    sys     0m0.272s
    
  • awk solution

    $ time awk -f main.awk test > /dev/null
    real    0m5.673s
    user    0m5.636s
    sys     0m0.036s
    
  • i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u – Đặng Thắng Dec 14 '17 at 06:55
  • @ĐặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :) – etopylight Dec 14 '17 at 07:40
  • can u explain for me with ur script of u. ..tks – Đặng Thắng Dec 15 '17 at 01:42
  • @ĐặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you. – etopylight Dec 15 '17 at 03:26
  • 1
    much more elegant than my awk '!(NR%2){print$0" " p}{p=$0}' | uniq -c | awk '{print $3"\n"$2" "$1" times"}' – Tim Kennedy Dec 21 '17 at 04:26
  • @TimKennedy Thanks for the alternative solution, I found your condition statement to be more concise. – etopylight Dec 21 '17 at 06:46