2

I am trying to filter a log greater than a specific time on a current date with specific text. I have successfully filtered text with current date in a log. Here is the command:

grep "$(date +"%d/%b/%Y")" test.log | grep -i "failed login"

Here is the sample log:

[04/Dec/2019 02:05:13 -0800] access       WARNING  10.126.49.92 -anon- - "POST /hue/accounts/login HTTP/1.1"-- Failed login for user: testuser

[04/Dec/2019 02:05:15 -0800] access       WARNING  10.126.49.92 -anon- - "POST /hue/accounts/login HTTP/1.1"-- Failed login for user: testuser

[04/Dec/2019 02:04:59 -0800] access       INFO     10.126.49.92 ahmed.rao - "POST /notebook/api/check_status HTTP/1.1" returned in 759ms

[04/Dec/2019 02:05:00 -0800] base         INFO     Selected cluster 0e83a448-26c9-459b-a0f2-3478ecb119af {u'interface': u'impala', u'namespace': u'0e83a448-26c9-459b-a0f2-3478ecb119af', u'type': u'direct', u'id': u'0e83a448-26c9-459b-a0f2-3478ecb119af', u'name': u'0e83a448-26c9-459b-a0f2-3478ecb119af'} interface hiveserver2

[04/Dec/2019 03:05:00 -0800] access       INFO     10.126.49.92 ahmed.rao - "POST /notebook/api/close_statement HTTP/1.1" returned in 1345ms

[04/Dec/2019 03:05:00 -0800] base         INFO     Selected cluster 0e83a448-26c9-459b-a0f2-3478ecb119af {u'interface': u'impala', u'namespace': u'0e83a448-26c9-459b-a0f2-3478ecb119af', u'type': u'direct', u'id': u'0e83a448-26c9-459b-a0f2-3478ecb119af', u'name': u'0e83a448-26c9-459b-a0f2-3478ecb119af'} interface hiveserver2

[04/Dec/2019 03:05:18 -0800] access       WARNING  10.126.49.92 -anon- - "POST /hue/accounts/login HTTP/1.1"-- Failed login for user: testuser

However, I can't figure out how to enforce greater than particular time condition.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • Do I understand you correctly that you have a system or service log with timestamped entries and want to grep for a specific text, but only on those lines after a given timestamp? Could you please provide a sample excerpt of your logfile so that we can better understand the file syntax structure? – AdminBee Dec 04 '19 at 10:08
  • yes, you understood correctly. I have added sample in my question – user1584253 Dec 04 '19 at 10:12
  • ur are trying to grep the current time in a log, which may not be available ... can u confirm?. – Siva Dec 04 '19 at 10:59
  • yes, log contain entries of the past date. That's why filtered by current time – user1584253 Dec 04 '19 at 11:01

4 Answers4

4

With ts from moreutils, you can easily convert those timestamps to a more useful format:

ts -r %FT%T%z < file.log |
  awk '$0 > "[2019-12-04T02:50" && tolower($0) ~ /failed login/'

Which on your input (and in the America/Los_Angeles timezone) gives:

[2019-12-04T03:05:18-0800] access       WARNING  10.126.49.92 -anon- - "POST /hue/accounts/login HTTP/1.1"-- Failed login for user: testuser

ts with -r parses that timestamp and converts it to that specified %FT%T%z strftime format (in your timezone).

As that YYYY-MM-DDTHH:MM:SS format sorts the same lexically and chronologically¹, it's then just a matter of doing a string comparison in awk to find that entries later than a specified date. awk can also do grep -i's job. Here using the standard tolower() approach for case insensitive match. With GNU awk you could also do:

gawk -v IGNORECASE=1 '$0 > "[2019-12-04T02:50" && /failed login/'

If you don't have moreutils, you could do the parsing with perl's Time::Piece for instance (ts is a perl script that uses Date::Parse but contrary to Time::Piece, that's not one of perl's core modules, so may not be installed on your system):

CUT=2019-12-04T02:50:00-0800 perl -MTime::Piece -F'[][]' -ale '
  BEGIN{$cut = Time::Piece->strptime($ENV{CUT}, "%FT%T%z")}
  print if /failed login/i &&
           Time::Piece->strptime($F[1], "%d/%b/%Y %T %z") >= $cut' < file.log

¹ if we ignore the blips at winter/summer clock change times in timezones that do DST

  • Could you explain how this ts works? Also, what is the > doing exactly? Is a simple arithmetic comparison of the two date strings really enough? You seem to be comparing the entire line to [2019-12-04T02:50, how does that work? Finally, I get three lines from this, not one, are you sure about your output? And why would any of the lines in the example be excluded? They're all from Dec 4, why would any be skipped? – terdon Dec 04 '19 at 11:58
  • 1
    @terdon, that's a string comparison. The standard YYYY-MM-DDTHH:MM:SS format is useful for comparison as it sorts lexically the same as chronologically. You get a different set of lines because you're in a different timezone. That's why I mention a Los Angeles timezone which matches the UTC offset of the OP. ts outputs the timestamps in your timezone and compares against 2019-12-04T02:50 meant to be in your local time. The lines that are skipped don't contain failed login. – Stéphane Chazelas Dec 04 '19 at 12:03
  • That makes perfect sense, thanks. Sigh for a second there I was hoping I might have caught you making a mistake! One thing though, how does ts find the timestamp? You don't seem to be helping it identify it in any way. – terdon Dec 04 '19 at 12:12
  • I tried ' gawk -v IGNORECASE=1 '$0 > "2019-12-04 03:05:15" && /failed login/' test.log' ... date condition not working with respect to timestamp – user1584253 Dec 04 '19 at 12:29
  • @user1584253, that gawk line is intended as a replacement for the awk one above. I'm just showing a different way to do case insensitive matching if you have access to the GNU implementation of awk. Of course, you still need to convert the timestamps in the right format with the ts command. – Stéphane Chazelas Dec 04 '19 at 13:28
  • Ohh, my bad. I donot have moreutil library installed and yum repositories link is blocked in my organization. Cant install moreutil directly. When trying to install using rpm file, it gives a list of dependencies to be installed first manually. Any suggestion, how to install moreutil? – user1584253 Dec 05 '19 at 05:20
  • @user1584253, ts is a short perl script which uses Date::Parse to parse timestamps. Provided you have Date::Parse installed, you could just copy the ts script. – Stéphane Chazelas Dec 05 '19 at 06:45
0

I'm leaving this in case anyone finds it useful, but just use this answer instead. It's much simpler and more efficient.


Here's a perl way:

$ perl -lne 'if(/^\[([^]]+)/){$d=$1; chomp($dateThreshold=`date -d "04 Dec 2019" +%s`); $d=~s|/| |g; chomp($d=`date -d "$d" +%s`); print if $d >= $dateThreshold;} ' test.log 
[04/Dec/2019 02:05:13 -0800] access       WARNING  10.126.49.92 -anon- - "POST /hue/accounts/login HTTP/1.1"-- Failed login for user: testuser
[04/Dec/2019 02:05:15 -0800] access       WARNING  10.126.49.92 -anon- - "POST /hue/accounts/login HTTP/1.1"-- Failed login for user: testuser
[04/Dec/2019 02:04:59 -0800] access       INFO     10.126.49.92 ahmed.rao - "POST /notebook/api/check_status HTTP/1.1" returned in 759ms
[04/Dec/2019 02:05:00 -0800] base         INFO     Selected cluster 0e83a448-26c9-459b-a0f2-3478ecb119af {u'interface': u'impala', u'namespace': u'0e83a448-26c9-459b-a0f2-3478ecb119af', u'type': u'direct', u'id': u'0e83a448-26c9-459b-a0f2-3478ecb119af', u'name': u'0e83a448-26c9-459b-a0f2-3478ecb119af'} interface hiveserver2
[04/Dec/2019 03:05:00 -0800] access       INFO     10.126.49.92 ahmed.rao - "POST /notebook/api/close_statement HTTP/1.1" returned in 1345ms
[04/Dec/2019 03:05:00 -0800] base         INFO     Selected cluster 0e83a448-26c9-459b-a0f2-3478ecb119af {u'interface': u'impala', u'namespace': u'0e83a448-26c9-459b-a0f2-3478ecb119af', u'type': u'direct', u'id': u'0e83a448-26c9-459b-a0f2-3478ecb119af', u'name': u'0e83a448-26c9-459b-a0f2-3478ecb119af'} interface hiveserver2
[04/Dec/2019 03:05:18 -0800] access       WARNING  10.126.49.92 -anon- - "POST /hue/accounts/login HTTP/1.1"-- Failed login for user: testuser

And, a little clearer:

perl -lne 'if(/^\[([^]]+)/){ ## skip lines that do not match
            ## Save the date of the current line as $d
            $d=$1; 
            ## Replace all slashes with pipes so the 'date' command
            ## can read this as a date.
            $d=~s|/| |g; 
            ## Now, translate $d into seconds since the epoch
            chomp($d=`date -d "$d" +%s`); 
            ## Set the threshold date in seconds since the epoch.
            chomp($dateThreshold=`date -d "04 Dec 2019" +%s`); 
            ## Print this line if its date is greater than or equal to the threshold
            print if $d >= $dateThreshold;
           } ' test.log 

Finally, you could make it a bit more efficient by moving the step that sets the threshold into a BEGIN block so it is only run once, when the script starts:

perl -lne 'BEGIN{chomp($dateThreshold=`date -d "04 Dec 2019" +%s`); } if(/^\[([^]]+)/){$d=$1; $d=~s|/| |g; chomp($d=`date -d "$d" +%s`); print if $d >= $dateThreshold;} ' test.log 
terdon
  • 242,166
0

using sed:

sed -n "/$(date +'%d\/%b\/%Y')/,/*/p" test.log | grep -i "failed login"
  • prints all after the match(date).

NOTE: The current date must be available in the log file.

Siva
  • 9,077
  • How do I incorporate time ? – user1584253 Dec 04 '19 at 12:46
  • modify the date part in the command as per the requirement. – Siva Dec 04 '19 at 13:37
  • That matches from on occurrence of that date to the next line that contains a *. If you want to select the lines until the end of the file, use $ as the second address. You can also use a different separator than / to avoid having to escape the /s in the date: sed -n "\|$(date +'%d\/%b\/%Y')|,\$p" – Stéphane Chazelas Dec 05 '19 at 18:21
0

Here is yet another answer using GNU awk, which resorts to calling the GNU date command.

The awk program (let's call is find_after_timestamp.awk) looks like this:

BEGIN{
    gsub("/"," ",start_datetime)
    extcmd=sprintf("date -d \"%s\" +\"%%Y %%m %%d %%H %%M %%S\"",start_datetime)
    extcmd | getline startstring
    close(extcmd)
    start_ts=mktime(startstring)
    print "Lines will be matched starting with timestamp",start_ts
printf(&quot;Will look for: \&quot;%s\&quot;\n&quot;,searchpat)

}

{ if (match($0,/^[([[:print:]])][[:print:]]$/,line_datetime)==0) next gsub("/"," ",line_datetime[1]) extcmd=sprintf("date -d &quot;%s&quot; +&quot;%%Y %%m %%d %%H %%M %%S&quot;",line_datetime[1]) extcmd | getline line_dtstring close(extcmd) line_ts=mktime(line_dtstring) if (line_ts > start_ts && $0 ~ searchpat) print }

You would call it as

awk -v start_datetime="04/Dec/2019 02:05:21 -0800" -v searchpat="[Ff]ailed login" -f find_after_timestamp.awk test.log

Where the variable start_datetime would be the start of you search range, i.e. all entries with date/time equal or after this point in time will be considered. The value of start_datetime must have the same format as it would in your logfile, but apart from that is arbitrary and does not need to be a value actually present in the file. The variable searchpat would contain the pattern you are looking for.

Explanation

  • The construct revolves around converting your (rather "non-standard") date/time specification DD/MONTH/YYYY HH:MM:SS TIMEZONE into something that GNU date understands, by replacing the / in the date part with whitespaces using gsub.

  • It then calls the external date command by executing the string extcmd in a shell and reading the result into a string variable (startstring in the setup phase, line_dtstring in the file-parsing phase) which is now formatted so that awks builtin mktime function can parse ist.

  • The mktime command converts the human-readable date/time specification into a purely-numeric UNIX time which can be compared using arithmetic comparison.

  • In the BEGIN phase this is done to convert your start date specification, in the main body this is done to convert the timestamp associated with the current line. Lines which don't have a timestamp will be ignored (if (match(...)==0) next).

  • If the timestamp of the current line is larger (=later) than the reference start timestamp, AND the searchpat is found on the line, the line will be printed.

I am aware that resorting to external programs in an awk program is somewhat frowned upon, but this will do the job with basic tools available on virtually any installation.

AdminBee
  • 22,803