I have a simple awk program ip.awk to find the highest occurrence of an ip address in a log file. IP addresses are in the first column:
$cat ip.awk
{ ip[$1]++ }
END {
for (i in ip)
if ( max < ip[i] ) {
max = ip[i]
maxnumber = i }
print maxnumber, " has accessed ", max, " times.", " $1 is: ", $1 }
And I am using it to parse a file access.log, a few sample entries from which are shown below:
173.13.151.14 - - [11/Sep/2014:23:57:53 +0100] "GET /wp/wp-includes/js/jquery/jquery-migrate.min.js?ver=1.2.1 HTTP/1.1" 200 7404 "http://theurbanpenguin.com/wp/?p=2407" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
173.13.151.14 - - [11/Sep/2014:23:57:53 +0100] "GET /wp/wp-content/themes/twentytwelve/js/navigation.js?ver=20140711 HTTP/1.1" 200 1720 "http://theurbanpenguin.com/wp/?p=2407" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
173.13.151.14 - - [11/Sep/2014:23:57:53 +0100] "GET /wp/wp-content/uploads/2013/11/tailshadow.png HTTP/1.1" 200 11433 "http://theurbanpenguin.com/wp/?p=2407" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
173.13.151.14 - - [11/Sep/2014:23:57:53 +0100] "GET /wp/wp-content/uploads/2014/05/cropped-wp3.png HTTP/1.1" 200 65326 "http://theurbanpenguin.com/wp/?p=2407" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
173.13.151.14 - - [11/Sep/2014:23:57:53 +0100] "GET /wp/?p=2407 HTTP/1.1" 200 21717 "https://www.google.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
The awk script, rightly gives, I believe,:
$awk -f ip.awk access.log
68.107.81.110 has accessed 311 times. $1 is: 70.168.57.66
My confusion is in the value of $1, which from what I understand, should be changed line-by-line to the value in the first column of the line as awk moves through the log file access.log.
This checks out from the check that I added at the end of the program ( "$1 is: ", $1" ), as this gives back the ip address of the last line ( the log files is 30000+ lines so I made a program to check that this script was actually working:
$cat testfile.log
1 apple
2 banana
2 banana
3
3
3
4
4
4
4
5
5 flerb
5 flerb
5 flerb
5 flerb
5 flerb , green - tea
6
7
8 grapes 0 and some more filler to make a long line
9
But when I do this I get, the right answer, but don't get "9" for the value of $1 when I print it out. What am I missing?
$awk -f ip.awk testfile.log
5 has accessed 6 times. $1 is:
Attempting to eliminate another variable I awked the first column of ip addresses alone to a new file and ran ip.awk on it, and got the exact same results as when I run ip.awk on the full log file, as expected. I also feel like I'm missing something fundamental because how is a dotted-decimal ip address being used with an array? Also: if I use 1.0 2.0... for 1 2... I still get the correct answer but still no $1 value.
Answer: As thecarpy suggested, the problem was that when entering values in my testfile I hit enter after the last value, adding a superfluous newline which set $1 to an empty string when it parsed that line.
awk
script and input file, my output is:5 has accessed 6 times. $1 is: 9
. – DopeGhoti Jul 10 '17 at 22:55