2

I'd like to find a file using multiple patterns.

This is my original command: But it's long to type and xargs zgrep is being redundant. Imagine if I have 10 or more patterns to input?

find -mtime -$a -type f ! -name "*.bak*" | xargs zgrep -il "$b" | xargs zgrep -il "$c" | xargs zgrep -il "$d" | xargs zgrep -il 'ST.997' | sort -u

I wanted like less characters to type like for example:

find -mtime -$a -type f ! -name "*.bak*" | xargs zgrep -il "$b && $c && $d" | sort -u

EDIT: If you notice the patterns are associated with $. That's because the command is inside a script and those variables has string/numeric values.

I will use this to improve my script especially its run time.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

3 Answers3

5

If you want to avoid having to decompress the file again and again for each pattern, you could do:

PATTERNS='foo
bar
baz' find . -mtime -"$a" -type f ! -name "*.bak*" -exec awk -v q=\' '
  function shquote(s) {
    gsub(q, q "\\" q q, s)
    return q s q
  }
  BEGIN {
    n = split(ENVIRON["PATTERNS"], pats, "\n")
    for (arg = 1; arg < ARGC; arg++) {
      file = ARGV[arg]
      cmd = "gzip -dcf < " shquote(file)
      for (i = 1; i <= n; i++) notfound[pats[i]]
      left = n
      while (left && (cmd | getline line) > 0) {
        for (pat in notfound) {
          if (line ~ pat) {
            if (!--left) {
              print file
              break
            }
            delete notfound[pat]
          }
        }
      }
      close(cmd)
    }
    exit
  }' {} +

Note that the patterns are taken as awk patterns, that's similar to the extended regular expressions supported by grep -E/egrep. For case insensitive matching, you can add a -v IGNORECASE=1 if using GNU awk, or portably change to:

PATTERNS='foo
bar
baz' find . -mtime -"$a" -type f ! -name "*.bak*" -exec awk -v q=\' '
  function shquote(s) {
    gsub(q, q "\\" q q, s)
    return q s q
  }
  BEGIN {
    n = split(tolower(ENVIRON["PATTERNS"]), pats, "\n")
    for (arg = 1; arg < ARGC; arg++) {
      file = ARGV[arg]
      cmd = "gzip -dcf < " shquote(file)
      for (i = 1; i <= n; i++) notfound[pats[i]]
      left = n
      while (left && (cmd | getline line) > 0) {
        line = tolower(line)
        for (pat in notfound) {
          if (line ~ pat) {
            if (!--left) {
              print file
              break
            }
            delete notfound[pat]
          }
        }
      }
      close(cmd)
    }
    exit
  }' {} +

(assuming the patterns don't have non-standard ERE extensions like \S, which would be converted to \s).

You could put that awk command in a zgrep-many script to make it easier to use. Something like:

#! /bin/sh -

usage() {
  cat >&2 << EOF
Usage: $0 [-e <pattern>] [-f <file] [-i] [pattern] files

List the files for which all the given patterns are matched.
EOF
  exit 1
}

ignorecase= 
PATTERNS=
export PATTERNS
NL='
'
sep=

while getopts e:f:i opt; do
  case $opt in
    (e) PATTERNS=$PATTERNS$sep$OPTARG; sep=$NL;;
    (f) PATTERNS=$PATTERNS$sep$(cat < "$OPTARG") || exit; sep=$NL;;
    (i) ignorecase='tolower(';;
    (*) usage;;
  esac
done
shift "$((OPTIND - 1))"
if [ -z "$PATTERNS" ]; then
  [ "$#" -gt 0 ] || usage
  PATTERN=$1; shift
fi

[ "$#" -eq 0 ] && exit

exec awk -v q=\' '
  function shquote(s) {
    gsub(q, q "\\" q q, s)
    return q s q
  }
  BEGIN {
    n = split('"$ignorecase"'ENVIRON["PATTERNS"]'"${ignorecase:+)}"', pats, "\n")
    for (arg = 1; arg < ARGC; arg++) {
      file = ARGV[arg]
      cmd = "gzip -dcf < " shquote(file)
      for (i = 1; i <= n; i++) notfound[pats[i]]
      left = n
      while (left && (cmd | getline line) > 0) {
        '"${ignorecase:+line = tolower(line)}"'
        for (pat in notfound) {
          if (line ~ pat) {
            if (!--left) {
              print file
              break
            }
            delete notfound[pat]
          }
        }
      }
      close(cmd)
    }
    exit
  }' "$@"

To be used as:

find ... -exec zgrep-many -ie foo -e bar -e baz {} +

for instance.

2

grep doesn't have AND option for matching multiple patterns, but you can essentially OR match patterns using |. If you use extended syntax, you could combine multiple patterns with all their combinations:

a.*b.*c|a.*c.*b|b.*a.*c|b.*c.*a|c.*a.*b|c.*b.*a

But its probably not a good idea if you got more than two patterns since the number of combinations goes up quickly.

You could combine your zgrep commands using -exec. Use quiet option -q for every zgrep except the last one (which prints the filename if it and all previous greps found a match).

find -mtime -$a -type f ! -name "*.bak*"      \
        -exec zgrep -iq "$b" {} \;            \
        -exec zgrep -iq "$c" {} \;            \
        -exec zgrep -il "$d" {} \; | sort
sebasth
  • 14,872
  • How to make that command into a variable? My command is inside a script and I use it as a value to a variable like output=$(command in here). I use it to further call to another function in my script. And does it speeds up the run time of the command? – WashichawbachaW Aug 24 '17 at 08:47
  • It executes the right file but it takes longer to output than my original command. Do you have anymore better shorter to type command? My original command takes only 16 seconds to finish. Yours takes 39-45 secs. Tha's more than twice as slow compared to mine – WashichawbachaW Aug 24 '17 at 08:58
  • No, I don't think there is any way this is going to be faster since zgrep is executed separately for each match and since the next -exec depends on return value of previous -exec passing more filenames in find isn't likely going to work. – sebasth Aug 24 '17 at 09:23
1

You might use find with a command running three zgrep-s like

  find -mtime -$a -type f ! -name "*.bak*"      \
       -exec zgrep -q {} "$b" \; \
       -a   -exec zgrep -q {} "$c" \; \
       -a   -exec zgrep -q {} "$d" \; \
    | sort

You could also collect first the names of files to grep e.g.

 find -mtime -$a -type f ! -name "*.bak*" > /tmp/file-list

(assuming your file names are nice, without spaces)

then loop on every line in /tmp/file-list

At last, you could write a script in another language (awk, Python, ...)

and to avoid typing, you might define a shell function.