0

I am trying to filter a log file using awk. Now the filtering is based on time, log file entries that are out of the time range are dropped and entries within the range are kept. Now once I encounter an entry which is within the time range I know that all following entries will also be withing the time range.

Thus, there is no need for any more checking, so is there a way in awk to do this cleanly ? I mean I can use a flag variable denoting that no more checking is required and print each line. But is there a way to say like "just process all remaining lines" ?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
ng.newbie
  • 1,175
  • [edit] your question to include concise, testable sample input and expected output so we can best help you. – Ed Morton Jan 23 '20 at 04:16

4 Answers4

2
awk 'flag == 0 || some_test { flag = 1 } flag == 1 { processing }'

This would use a boolean/binary "flag" for keeping track of when processing can proceed to the end of the file.

The first block tests for the point in the data where the processing can start. The some_test thing should be your already existing test. It will be carried out for as long as flag == 0. As soon as your test is true, the flag is toggled to 1, which disables your test and also enables the processing block.

The last block will run for all lines from the first line that triggers your some_test to the end of the file.

Kusalananda
  • 333,661
  • Any way without the flag ? – ng.newbie Jan 22 '20 at 15:46
  • @ng.newbie Sure. Write your test in such a way that it's true for all lines that you want to process. – Kusalananda Jan 22 '20 at 15:48
  • @ng.newbie why do you care if there's an implementation that uses a flag printing the remaining lines or not? Are you trying to solve some perceived (but actually non-existent) performance problem with f{print}? – Ed Morton Jan 23 '20 at 04:20
1

I shall assume that the times on your log are sorted ascending.

Your condition of "once I encounter an entry which is within the time range I know that all following entries will also be withing the time range" could be written as:

1

time >= start_of_range , 0 { print }

Where:

  • time is a field or expression that extract the time from the line being processed.

  • start_of_range is the smallest value of the range of time to process.

  • , express a range in the sense that Awk understand ranges, it will start on the first time the left side of the , is true and will end when the right side of the , is true. In this case never (0), which will apply the command at the right to all following lines til the end, print in this case.

Make that the first line of your awk script:

awk '$7 >= "2015-08-12" , 0 { print }'

And even the print could be removed as it is the default action for a true pattern (the matching range).

awk '$7 >= "2015-08-12" , 0'

2

The alternative would be to swap the test and do:

awk '$7 < "2015-08-12" {next}
     {print}
    ' file

Which could be written simply as:

awk '$7 < "2015-08-12" {next} 1' file

But that will keep evaluating the test for all lines.

0

1. Yes, use a range with a 0 or "" (= false, never match) end condition:

awk '<is_within_the_range>, 0'

where <is_within_the_range> is your condition, which can be any expression, except for another range.

The start condition will not be evaluated again after the the 1st match:

$ seq 1 6 | awk '
   function check(){ print "checking", $0; return $1 == 3 }
   check(), 0
'
checking 1
checking 2
checking 3
3
4
5
6

2. If you don't like ranges, you can of course just do the whole thing C-like non-awkwardly, by printing all the lines explicitly once the condition was matched:

seq 1 6 | awk '$1==3 { do print; while (getline > 0) }'

3. Another solution that, according to the POSIX standard, should work with regular, seekable files (not with pipes!), but does not actually work with most awk implementations, would be to rely on awk setting the file pointer to the end of the last record upon exiting, as all POSIX utilities are required to:

seq 1 6 > file
{ awk '$1 == 3 { print; exit }'; cat; } < file

IMLE this only works with the awk/nawk from Solaris, not with gawk, mawk or the "one true awk" from *BSD.


4. Finally, you can write your own state machine (eg. by setting a flag then checking it) --in a slow high level language which already provides a nice streamlined interface for it-- but that's something too dumb to be worth dwelling upon.

-1

Once a file line contains PATTERN, that line and all next lines are printed:

awk 'flag || /PATTERN/{flag=1} flag{print $0}' file

You can replace "print $0" with different code, if more processing needed.

Yurko
  • 718