Use /usr/xpg4/bin/grep
on Solaris to be able to read patterns from a file with -f
and to do string comparisons using -F
, then,
find /export/home/testing -type f -name apache_logs.txt -exec tail -n 50 {} \; |
/usr/xpg4/bin/grep -vF -f avoid.txt >result.txt
... where avoid.txt
is a text file with a string on each line:
akamai/sureroute
/wp7/wp-login.php
HTTP/1.0" 200
HTTP/1.1" 200
This would look for regular files called apache_logs.txt
in or under the directory /export/home/testing
. For each such file, tail -n 50
is called to get the last 50 lines (as per your code; use cat
in place of tail -n 50
to get the whole contents of each file).
The resulting lines of text are piped through /usr/xpg4/bin/grep
which will filter out (remove) each line that contains any of the substrings listed in the avoid.txt
file.
The options used with grep
are
-v
to invert the sense of the match (return lines not matching the pattern).
-F
to treat each pattern as a string and do string comparisons rather than regular expression matches. This allows the patterns in the file to contain characters that would otherwise be special in regular expressions, without escaping them.
-f avoid.txt
to read the patterns from the file avoid.txt
.
The remaining lines of text are written to result.txt
.
Without the -F
option, you would have to be a bit careful with the patterns in avoid.txt
and make them proper regular expressions. Maybe something like
akamai/sureroute
/wp7/wp-login\.php
HTTP/1\.[01]" 200
If you only ever expect your find
to find a single file, the code could be simplified into
tail -n 50 /path/to/apache_logs.txt |
/usr/xpg4/bin/grep -vF -f avoid.txt >result.txt
There are a few issues with your code:
- You don't quote variable expansions. See When is double-quoting necessary?
- You needlessly store the result of a pipeline in a variable and then use
echo
to output the result to a file.
- Your first
tail
+ grep
pipeline uses $file
on both sides of the pipe. This will cause grep
to ignore the input from tail
.
Your second (longer) pipeline will use result1.txt
only for the last grep
, and the earlier grep
commands will wait to read data from standard input (there will be none), and will eventually be killed when that last grep
is done.
A pipeline of this type usually looks like
command inputfile | command | command | command
i.e., you start with a command that reads data from some input file and writes to standard output. The output is read by the next command, and its output is read by the next, and so forth.
The output file, result.txt
, is rewritten from scratch for each found apache_logs.txt
file, since you write to it using >
in a loop. This may be ok if you only ever expect the find
to find a single file (in which case it would be better to not use find
at all as the file would presumably not move around in the filesystem).
You parse the output of find
(the pathnames of the found files) using read
. This is generally a bad idea since pathnames on Unix may contain any character, including newline and backslash, except for the nul character (\0
) which is a string terminator in the C programming language. See Why is looping over find's output bad practice?
Also related: