0

I need to get some files that doesn't contain some string on a large folder ≃ 10M files for 22 GO of data.

I try this command on local (macOS) :

egrep -r -L -Z 'string1|string2' * | wc -l

this work well (because to the number of files I got on local I think ≃ 500) but on my server I get no output and I can't stop the execution with Ctrlc.

So my question is:

Is there a way to achieve this command on a large folder? Or, Is there another way to count the number of files that do not contains 'string1' or 'string2'?

Paulo Tomé
  • 3,782
  • It may have tripped upon a fifo; try with egrep -Dskip -r ... –  Feb 27 '20 at 12:03
  • @mosvy thanks to your reply, on local this work but same thing on server, when I try to execute the command in the directory I get this error : -bash: /bin/egrep: Argument list too long – Louis Brahmi Feb 27 '20 at 12:15
  • 1
    Run with . instead of *. –  Feb 27 '20 at 12:16
  • @El-Burritos congratulations, you just hit ArgMax see http://www.in-ulm.de/~mascheck/various/argmax/ – Jetchisel Feb 27 '20 at 13:10
  • @mosvy After some minutes I get a 0 result with . – Louis Brahmi Feb 27 '20 at 13:13
  • 1
    Don't use the GNU grep -r/-R options to find files as it just creates Frankenstein calls to grep. Keep your code simple and robust and just use find to find files and grep to g/re/p within the files. There are big clues in the command names to their function! – Ed Morton Feb 27 '20 at 14:51
  • 1
    Thanks for the tips @EdMorton – Louis Brahmi Feb 27 '20 at 15:55

1 Answers1

4

Try this,

find . -type f -exec egrep -r -L 'string1|string2' {} +  | wc -l
  • . To search all from the current directory.
  • type f To search only in files.
  • + will use as many arguments per command to avoid exceeding the maximum limit of arguments per line
Siva
  • 9,077