read file line by line and remember the last position in file

Question

I want to grep some line from a log file with an input from another file. I'm using this small command to do that:

while read line; do 
    grep "$line" service.log; 
done < input_strings.txt > result.txt

input_strings.txt has like 50 000 strings (one per line). For every of this string I'm currently searching the huge service.log file (with around 2 000 000 lines).

So lets say the 1st string of input_strings.txt is found in service.log at line 10 000, this line gets written to my result.txt. After that, the 2nd string of input_strings.txt will be searched in service.log, BUT starting at line 1 of service.log.

How can I remember the last line I found the 1st entry in service.log? So that I can start the 2nd search-run there?

I should realize that relying on the last match position/offset will not give you all possible matches from input_strings.txt — RomanPerekhrest, Dec 19 '17 at 10:06
In this case it would. Because i got the right order in both files. i am sure, that the 2nd string from input_strings.txt is NOT before the 1st match. — xMaNuu, Dec 19 '17 at 10:08
you are contradicting yourself: at first time you said the 2nd string of input_strings.txt will be searched in service.log, BUT starting at line 1 of service.log - then, you added i am sure, that the 2nd string from input_strings.txt is NOT before the 1st match. Absolutely contradictory — RomanPerekhrest, Dec 19 '17 at 10:13
Yes, thats the current status of my problem. italic_the 2nd string of input_strings.txt will be searched in service.log, BUT starting at line 1 of service.log_italic << thats the problem of my while read line; to grep....
I just want to get the last position of a matched line, so i can start at this point for the next run of my while loop. — xMaNuu, Dec 19 '17 at 10:14
@xMaNuu I think that the approach that you're taking here to solve the problem seems suboptimal and it seems to me that you're likely to get a lot of resistance unless you give a clear explanation for why you want to go about things this way, e.g. why use a loop at all? — igal, Dec 19 '17 at 10:21
@igal, well, this would be a good opportunity to show them a better way of doing that, instead of using the shell, right? I don't think grep by itself can look for matches in a particular order. — ilkkachu, Dec 19 '17 at 10:25
@ilkkachu It's not clear to me exactly what they're asking for. — igal, Dec 19 '17 at 10:26
@xMaNuu You might also want to add small example files to demonstrate the desired output for some given input. — igal, Dec 19 '17 at 10:27
@xMaNuu, About using shell loops to process text, see: Why is using a shell loop to process text considered bad practice?. In brief: it's slow, and there are some gotchas, like the fact that you probably want to use read -r instead of plain read approximately every time. — ilkkachu, Dec 19 '17 at 10:47

igal · Accepted Answer · 2017-12-19T10:25:06.077

If you want to get the matches then you don't need to be using a loop at all. It would be much faster to just use a single grep command:

grep -Ff input_strings service.log > results.txt

That said, if you want to do literally what you stated in your question, then you could use a variable to keep track of the line on which the last match was found:

LINE_NUMBER=0
while read LINE; do

    # Search for the next match starting at the line number of the previous match
    MATCH="$(tail -n+${LINE_NUMBER} "service.log" | grep -n "${LINE}" | head -n1)";

    # Extract the line number from the match result
    LINE_NUMBER="${MATCH/:*/}";

    # Extract the matching string from the match result
    STRING="${x#*:}";

    # Output the matching string
    echo "${STRING}";

done < input_strings.txt > result.txt

grep -Ff input_strings service.log > results.txt
Thats exactly what i was looking for! thank you very much! — xMaNuu, Jan 25 '18 at 08:30

ilkkachu · Answer 2 · 2017-12-19T10:43:30.700

I gather you want to search for the first keyword, then continue on the line after that match to search for the next keyword etc., printing the matches as you go.

Given keywords:

foo
bar

And data:

bar 0
foo 1
bar 1
foo 2

The awk script here should do just that (tested with GNU awk):

$ awk 'BEGIN {i = j = 0} NR==FNR { k[i++] = $0; next} 
       $0 ~ k[j] {j++; print $0} j >= i {exit}' keywords data 
foo 1
bar 1

i and j start at 0, and during the first file (NR==FNR compares the record/line number of the current file to total number of lines seen), we collect the keywords to an array. After that, try to match the j:th keyword, and print and increase j on a match. Exit after all keywords are found.

As with grep, the keywords here are actually regex patterns, though obviously awk regexes here. If you want to search for fixed strings, use index($0, key) instead of $0 ~ key.

Alternatively, without loading the keywords at the start:

$ awk -vkeyfile=keywords 'BEGIN {getline key < keyfile } 
      $0 ~ key {print $0; if (!getline key < keyfile) exit;}' data
foo 1 
bar 1

This should be straightforward.

read file line by line and remember the last position in file

2 Answers2