grep: memory exhausted

Question

I was doing a very simple search:

grep -R Milledgeville ~/Documents

And after some time this error appeared:

grep: memory exhausted

How can I avoid this?

I have 10GB of RAM on my system and few applications running, so I am really surprised a simple grep runs out of memory. ~/Documents is about 100GB and contains all kinds of files.

grep -RI might not have this problem, but I want to search in binary files too.

Stéphane Chazelas · Accepted Answer · 2013-09-15T08:52:26.853

50

Two potential problems:

grep -R (except for the modified GNU grep found on OS/X 10.8 and above) follows symlinks, so even if there's only 100GB of files in ~/Documents, there might still be a symlink to / for instance and you'll end up scanning the whole file system including files like /dev/zero. Use grep -r with newer GNU grep, or use the standard syntax:
```
find ~/Documents -type f -exec grep Milledgeville /dev/null {} +
```
(however note that the exit status won't reflect the fact that the pattern is matched or not).
grep finds the lines that match the pattern. For that, it has to load one line at a time in memory. GNU grep as opposed to many other grep implementations doesn't have a limit on the size of the lines it reads and supports search in binary files. So, if you've got a file with a very big line (that is, with two newline characters very far appart), bigger than the available memory, it will fail.

That would typically happen with a sparse file. You can reproduce it with:
```
truncate -s200G some-file
grep foo some-file
```
That one is difficult to work around. You could do it as (still with GNU grep):
```
find ~/Documents -type f -exec sh -c 'for i do
  tr -s "\0" "\n" < "$i" | grep --label="$i" -He "$0"
  done' Milledgeville {} +
```
That converts sequences of NUL characters into one newline character prior to feeding the input to grep. That would cover for cases where the problem is due to sparse files.

You could optimise it by doing it only for large files:
```
find ~/Documents -type f $ -size -100M -exec \
  grep -He Milledgeville {} + -o -exec sh -c 'for i do
  tr -s "\0" "\n" < "$i" | grep --label="$i" -He "$0"
  done' Milledgeville {} + $
```
If the files are not sparse and you have a version of GNU grep prior to 2.6, you can use the --mmap option. The lines will be mmapped in memory as opposed to copied there, which means the system can always reclaim the memory by paging out the pages to the file. That option was removed in GNU grep 2.6

edited Sep 15 '13 at 08:52

answered Sep 10 '13 at 11:26

Stéphane Chazelas

544,893

Actually, GNU grep doesn't care about reading in 1 line, it reads a large portion of the file into a single buffer. "Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES." source: http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html – Godric Seer Sep 10 '13 at 12:32
4

@GodricSeer, it may still read a large portion of the file into a single buffer, but if it hasn't find the string in there and hasn't found a newline character either, my bet is that it keeps that single buffer in memory and reads the next buffer in, as it will have to display it if a match is found. So, the problem is still the same. In practice, a grep on a 200GB sparse file does fail with OOM. – Stéphane Chazelas Sep 10 '13 at 12:44
Yes, I wasn't arguing with your logic regarding the failure of grep. I just wanted to point out that GNU grep doesn't care about newline characters until after it has found the search string. You could have a file with a single character on each line, and it would still fail with OOM. – Godric Seer Sep 10 '13 at 12:48
1

@GodricSeer, well no. If lines are all small, grep can discard the buffers it has processed so far. You can grep the output of yes indefinitely without using more than a few kilobytes of memory. The problem is the size of the lines. – Stéphane Chazelas Sep 10 '13 at 12:51
Or said otherwise, regardless of how it does it, grep has to hold the full current line in memory. – Stéphane Chazelas Sep 10 '13 at 12:56
I see now. In actually it would only have to hold the current line if it found the search string for display, but yes, otherwise the buffers could be kept as small as you wanted (assuming the search string fits in them). – Godric Seer Sep 10 '13 at 12:59
@StephaneChazelas nice spotting for the "long line" buffering problem! I didn't knew about it, but it does make sense. Your "find" solution works nicely around it (maybe change "." to "~/Documents" in the find solution, however ^^) – Olivier Dulac Sep 10 '13 at 15:31
@StephaneChazelas thanks for the clarifications – reto Sep 10 '13 at 17:05
3

The GNU grep --null-data option may also be useful here. It forces the use of NUL instead of newline as an input line terminator. – iruvar Sep 16 '13 at 15:27
1

@1_CR, good point, though that also sets the output line terminator to NUL. – Stéphane Chazelas Sep 16 '13 at 15:35
1

Would the fold command help in those situations? For example, think of dd if=/dev/sda | fold -b $((4096*1024*1024)) | grep -a "some string" to limit the amount of memory required to 4GB – poinu Mar 02 '18 at 16:35

score 6 · Answer 2 · edited Sep 10 '13 at 11:37

6

I usually do

find ~/Documents | xargs grep -ne 'expression'

I tried a bunch of methods, and found this to be the fastest. Note that this doesn't handle files with spaces the file name very well. If you know this is the case and have a GNU version of grep, you can use:

find ~/Documents -print0 | xargs -0 grep -ne 'expression'

If not you can use:

 find ~/Documents -exec grep -ne 'expression' "{}" \;

Which will exec a grep for every file.

edited Sep 10 '13 at 11:37

Drav Sloan

14,345
4
45
43

answered Sep 10 '13 at 10:46

Kotte

2,537

This will break on files with spaces. – Chris Down Sep 10 '13 at 11:04
Hmm, that is true. – Kotte Sep 10 '13 at 11:08
You can get around that with find -print0 | xargs -0 grep -ne 'expression' – Drav Sloan Sep 10 '13 at 11:09
@ChrisDown rather a non-protable solution than a broken-portable solution. – reto Sep 10 '13 at 16:41
@ChrisDown Most major unices have adopted find -print0 and xargs -0 by now: all three BSD, MINIX 3, Solaris 11, … – Gilles 'SO- stop being evil' Sep 10 '13 at 21:21
Though this partly worked, it gave me the same out of memory error. I think Jenny D's solution works with less memory. – PJ Brunet Dec 31 '13 at 05:52

score 5 · Answer 3 · edited Jan 10 '23 at 02:20

5

I can think of a few ways to get around this:

Instead of grepping all files at once, do one file at a time. Example:
```
  find /Documents -type f -exec grep -H Milledgeville "{}" \;
```
If you only need to know which files contain the words, do grep -l instead. Since grep will there stop searching after the first hit, it won't have to keep reading any huge files

If you do want the actual text as well, you could string two separate greps along:

  for file in $( grep -Rl Milledgeville /Documents ); do \
      grep -H Milledgeville "$file"; done

edited Jan 10 '23 at 02:20

Alexis Wilke

2,857

answered Sep 10 '13 at 09:05

Jenny D

13,172

The last example is not valid syntax -- you'd need to perform a command substitution (and you shouldn't do that, since grep outputs using a delimiter that is legal in file names). You also need to quote $file. – Chris Down Sep 10 '13 at 11:05
The latter example suffers with the issue of file names having newline or whitespace in them, (it will cause for to process the file as two arguments) – Drav Sloan Sep 10 '13 at 11:12
@DravSloan Your edit, while an improvement, still breaks on legal file names. – Chris Down Sep 10 '13 at 11:19
1

Yeah I left it in because it was part of her answer, I just tried to improve it so it would run (for the cases where there is no spaces/newlines etc in files). – Drav Sloan Sep 10 '13 at 11:34
Corrections of his -> her, my apologies Jenny :/ – Drav Sloan Sep 10 '13 at 11:38

score 4 · Answer 4 · edited May 01 '19 at 16:10

4

I'm grepping a 6TB disk to search for lost data, and got the memory exhausted -error. This should work for other files too.

The solution we came up with was to read the disk in chunks by using dd, and grepping the chunks. This is the code (big-grep.sh):

#problem: grep gives "memory exhausted" error on 6TB disks
#solution: read it on parts
if [ -z $2 ] || ! [ -e $1 ]; then echo "$0 file string|less -S # greps in chunks"; exit; fi

FILE="$1"
MATCH="$2"

SIZE=`ls -l $1|cut -d\  -f5`
CHUNKSIZE=$(( 1024 * 1024 * 1 )) 
CHUNKS=100 # greps in (100 + 1) x 1MB = 101MB chunks
COUNT=$(( $SIZE / $CHUNKSIZE * CHUNKS ))

for I in `seq 0 $COUNT`; do
  dd bs=$CHUNKSIZE skip=$(($I*$CHUNKS)) count=$(( $CHUNKS+1)) if=$FILE status=none|grep -UF -a --context 6 "$MATCH"
done

edited May 01 '19 at 16:10

Dagelf

266

answered Jan 28 '19 at 19:35

PHZ.fi-Pharazon

336

3

Unless you read overlapping chunks, you would possibly miss matches on the chunk boundaries. The overlap must be at least as big as the string that you are expecting to match. – Kusalananda Jan 28 '19 at 19:59
Updated to search 1MB extra in each 100MB chunk... cheap hack – Dagelf May 01 '19 at 10:46

grep: memory exhausted

4 Answers4

Linked

Related