3

I have a set of txt files whose names may contain space or special characters like #.

I have a grep solution grep -L "cannot have" $(grep -l "must have" *.txt) to list all the files who have must have but not cannot have.

For instance, there is a file abc defg.txt which contains only 1 line: must have.

So normally the grep solution should find out abc defg.txt, but it returns:

grep: abc: No such file or directory
grep: defg.txt: No such file or directory

I think for filenames containing #, the grep solution is also invalid.

Could anyone help me amend the grep solution?

SoftTimur
  • 687

3 Answers3

2

Since you're already using GNU specific options (-L), you could do:

grep -lZ -- "must have" *.txt | xargs -r0 grep -L -- "cannot have"

The idea being to use -Z to print the list of file names NUL-delimited and use xargs -r0 to pass that list as arguments to the second grep.

Command substitution, by default, splits on space, tab and newline (and NUL in zsh). Bourne-like shells other than zsh also perform globbing upon each word resulting of that splitting.

You could do:

IFS='
' # split on newline only
set -f # disable globbing
grep -L -- "cannot have" $(
    set +f # we need globbing for *.txt in this subshell though
    grep -l -- "must have" *.txt
  )

But that would still break on filenames containing newline characters.

In zsh (and zsh only), you can do:

IFS=$'\0'
grep -L -- "cannot have" $(grep -lZ -- "must have" *.txt)

Or:

grep -L -- "cannot have" ${(ps:\0:)"$(grep -lZ -- "must have" *.txt)"}
2

IF you're willing to go further afield, awk can do it in one pass:

awk 'function s(){if(a&&!b){print f}} FNR==1{s();f=FILENAME;a=b=0} 
  /must have/{a=1} /cannot have/{b=1} END{s()}' filepattern

For recentish gawk you can simplify with BEGINFILE and ENDFILE. (Like all awk answers you can put the awk commands in a file with -f, and like most you can easily convert to perl if you prefer.)

  • 1
    Note however that grep -l/L stops reading at the first match so is likely to be more efficient (also because of the general awk code interpretation overhead). With GNU awk, you could use nextfile to avoid reading the whole file when it can be avoided (when cannot have is found). – Stéphane Chazelas May 08 '14 at 11:24
-1

Consider using find instead and grep using a shell command:

find . -name '*.txt' -print0 | xargs -0 -I{} sh -c 'grep -q "must have" -- "{}" && grep -L "cannot have" -- "{}"'
devnull
  • 10,691