Count uniq instances of blocks of 2 lines

Question

Given input:

144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603

How can I output:

144.252.36.69
afrloop=32235330165603 3 times
222.252.36.69
afrloop=31135330165603 4 times

PSA: Please don't post images of text – Wildcard Dec 15 '17 at 03:36 — Wildcard, Dec 15 '17 at 03:36

score 3 · Accepted Answer · answered Dec 14 '17 at 03:20

3

paste - - < file | sort | uniq -c

answered Dec 14 '17 at 03:20

n.caillou

393
2
7

Nice! Took me a little while to understand what paste does in this case :-) – NickD Dec 14 '17 at 03:27
@NickD yes, not very easy to understand. Paste takes multiple files as input and each of them can be prepaced with - which is signal to read from stdin. So here we read from stdin twice. – Martin Mucha Jan 30 '21 at 09:21

etopylight · Answer 2 · 2017-12-15T04:52:34.617

3

Here is a solution with awk if you want a customized output format

NR%2==1 {ip=$0; next}
NR%2==0 {a[ip"\n"$0]++}
END {
    for(i in a)
        printf "%s %d times\n", i, a[i]
}

the script can be executed as

awk -f main.awk file

Explanation

First, we use NR%2==1 to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line $0 into a variable called ip. We can use next to skip any further processing and go straight to the next iteration.
Second, we use NR%2==0 to match even number lines, if a line matches then we create an index labeled as ip"\n"$0 in an array a and increment the count value of that specific index. For example, an equivalent expansion would be like
```
a["144.252.36.69 afrloop=32235330165603"] += 1
```
I ignored the new line \n in this example just for simplicity
Finally at END, after each line has been processed, we use a for loop to print out the value of each element inside array a which in our case is the count number for each unique index

Fun Benchmark

Test file generation (1 million records)

awk '
    BEGIN{for(i=1;i<10000000;i++)
    printf "%d\nafrLoop=%d\n", int(rand()*100), int(rand()*10)}
' > test

$ head test
23
afrLoop=2
84
afrLoop=1
58

@n.caillou paste solution

$ time paste - - < test | sort | uniq -c > /dev/null
real    0m11.250s
user    0m11.352s
sys     0m0.272s

awk solution

$ time awk -f main.awk test > /dev/null
real    0m5.673s
user    0m5.636s
sys     0m0.036s

edited Dec 15 '17 at 04:52

answered Dec 14 '17 at 03:54

etopylight

411

i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u – Đặng Thắng Dec 14 '17 at 06:55
@ĐặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :) – etopylight Dec 14 '17 at 07:40
can u explain for me with ur script of u. ..tks – Đặng Thắng Dec 15 '17 at 01:42
@ĐặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you. – etopylight Dec 15 '17 at 03:26
1

much more elegant than my awk '!(NR%2){print$0" " p}{p=$0}' | uniq -c | awk '{print $3"\n"$2" "$1" times"}' – Tim Kennedy Dec 21 '17 at 04:26
@TimKennedy Thanks for the alternative solution, I found your condition statement to be more concise. – etopylight Dec 21 '17 at 06:46

Count uniq instances of blocks of 2 lines

2 Answers2