intelligent way to read another file for words to exclude while reading logs

Question

I want to have a separate text file which I can modify anytime for words to be excluded while grepping the logs file. I have written below basic script to serve my purpose as of now. in SunOS. Please help.

find /export/home/testing/ -type f -name "apache_logs.txt" |while read file
  do  
    result=$(tail -50 $file |grep -v 'HTTP/1.1" 200'  $file)
    echo "$result" > result1.txt
    grep -v 'akamai/sureroute' | grep -v '/wp7/wp-login.php' | grep -v  'HTTP/1.0" 200' result1.txt  > result.txt; 
  done

Kusalananda · Accepted Answer · 2019-07-24T10:20:09.953

Use /usr/xpg4/bin/grep on Solaris to be able to read patterns from a file with -f and to do string comparisons using -F, then,

find /export/home/testing -type f -name apache_logs.txt -exec tail -n 50 {} \; |
/usr/xpg4/bin/grep -vF -f avoid.txt >result.txt

... where avoid.txt is a text file with a string on each line:

akamai/sureroute
/wp7/wp-login.php
HTTP/1.0" 200
HTTP/1.1" 200

This would look for regular files called apache_logs.txt in or under the directory /export/home/testing. For each such file, tail -n 50 is called to get the last 50 lines (as per your code; use cat in place of tail -n 50 to get the whole contents of each file).

The resulting lines of text are piped through /usr/xpg4/bin/grep which will filter out (remove) each line that contains any of the substrings listed in the avoid.txt file.

The options used with grep are

-v to invert the sense of the match (return lines not matching the pattern).
-F to treat each pattern as a string and do string comparisons rather than regular expression matches. This allows the patterns in the file to contain characters that would otherwise be special in regular expressions, without escaping them.
-f avoid.txt to read the patterns from the file avoid.txt.

The remaining lines of text are written to result.txt.

Without the -F option, you would have to be a bit careful with the patterns in avoid.txt and make them proper regular expressions. Maybe something like

akamai/sureroute
/wp7/wp-login\.php
HTTP/1\.[01]" 200

If you only ever expect your find to find a single file, the code could be simplified into

tail -n 50 /path/to/apache_logs.txt |
/usr/xpg4/bin/grep -vF -f avoid.txt >result.txt

There are a few issues with your code:

You don't quote variable expansions. See When is double-quoting necessary?
You needlessly store the result of a pipeline in a variable and then use echo to output the result to a file.
Your first tail + grep pipeline uses $file on both sides of the pipe. This will cause grep to ignore the input from tail.
Your second (longer) pipeline will use result1.txt only for the last grep, and the earlier grep commands will wait to read data from standard input (there will be none), and will eventually be killed when that last grep is done.

A pipeline of this type usually looks like
```
command inputfile | command | command | command
```
i.e., you start with a command that reads data from some input file and writes to standard output. The output is read by the next command, and its output is read by the next, and so forth.
The output file, result.txt, is rewritten from scratch for each found apache_logs.txt file, since you write to it using > in a loop. This may be ok if you only ever expect the find to find a single file (in which case it would be better to not use find at all as the file would presumably not move around in the filesystem).
You parse the output of find (the pathnames of the found files) using read. This is generally a bad idea since pathnames on Unix may contain any character, including newline and backslash, except for the nul character (\0) which is a string terminator in the C programming language. See Why is looping over find's output bad practice?

Also related:

Understanding the -exec option of `find`

Thanks a lot for your in detail reply. Worked like magic :) – Bharat Jul 24 '19 at 13:48 — Bharat, Jul 24 '19 at 13:48

intelligent way to read another file for words to exclude while reading logs

1 Answers1