It turns out I was accidentally using grep
wrong yesterday. I just checked my bash history and saw what I was executing:
grep search-string-here -f large-file-with-long-lines.txt
And that was what was causing the memory exhaustion.
Executing:
grep search-string-here large-file-with-long-lines.txt
...has the desired behavior.
Thanks to @αғsнιη for pointing to the question with a similar mistake, and to @EdMorton and @ilkkachu for correcting my assumptions about the length of the lines and how grep
and awk
use memory.
Below is the original question (though it seems I was wrong about the long lines not being able to fit in 8 GB of RAM) and @EdMorton's accepted answer.
I have a very large file (over 100 GB) with very long lines (can't even fit in 8 GB RAM) and I want to search it for a string. I know grep
can't do it because grep
tries to put entire lines into memory.
So far the best solution I've come up with is:
awk '/search-string-here/{print "Found."}' large-file-with-long-lines.txt
I'm actually happy with this solution, but I'm just wondering if there is some more intuitive way to do it. Maybe some other implementation of grep
?
found
" when matched lines? for all of them or just one if enough then exit processing the rest if the file? what is the line break? newline? is there any other unique delimiter other than that, that we can use it as record seperator which with awk we can specify it? – αғsнιη Nov 16 '21 at 10:59print "Found"
toprint "Found"; exit
so you don't need to continue reading the file for no further benefit – Chris Davies Nov 16 '21 at 12:17llgrep
(ll for long lines) or something like that.@ilkkachu Yeah, I thought that might be the case.
– hilltothesouth Nov 17 '21 at 14:37grep foo -f largefile
would use the lines oflargefile
as patterns, it'd probably try to load them into memory all at the same time. – ilkkachu Nov 17 '21 at 15:31