0
#!/bin/bash
LIST=/errors_exception.txt
cd /test
for PATTERN in `cat $LIST`
do
        for FILE in $(ls)
        do
        if zcat $FILE | grep -Fxq "$PATTERN"; then
        echo "$PATTERN found pattern in $FILE" >> output
        fi
done
done

I'm trying to scan a lots of compressed log files (.gz) and check if the pattern I'm looking for is still exist on those logs.

For example in my code above, lets say, errors_exception.txt contains below

one 
one two three
four five
six

/test - dir contain log files

Why is it when I run the script, it doesn't read the 2nd line "one two three" as a single line?

When I run bash -x test.sh (name of the script) it reads the 2nd line like there's another 3 more line where in the text file it shows "one two three" as a single line.

HalosGhost
  • 4,790

1 Answers1

3
list=/errors_exception.txt
cd /test
while IFS= read -r pattern ; do
    for file in * ; do
        if zcat < "$file" | grep -Fxq "$pattern"; then
            echo "$pattern found pattern in $file"
        fi
    done
done <"$list" > output

Notes:

  • Neither of the two lines below will do what you expect:

    for PATTERN in `cat $LIST`
    
    for FILE in $(ls)
    

    In both case, the shell does word splitting which you don't want. The suggested code above avoids this.

  • Is the file errors_exception.txt really in the root directory?

  • I converted the variables to lower case. This is the convention for user created variables. This convention will prevent you from accidentally overwriting some critical shell parameter.

More on word splitting

When the shell executes:

for PATTERN in `cat $LIST`

it runs cat $LIST. When it does that, spaces, tabs, and carriage returns are all treated as the same thing: a word break. So, effectively, after the word splitting, this line becomes:

for PATTERN in one one two three four five six

and, as the for loop executes, PATTERN is assigned sequentially to one, one, two, three, four, five, and six.

What you really want is for each line to be treated as a line. This is why the while read.... done<"$list" construct is used instead: on each loop, it reads one whole line.

The same issue would happen with this line is any file names have spaces in them:

for FILE in $(ls)

The results of ls are substituted into the line and, if any file names have spaces, tabs, or carriage returns in them (all of these are legal characters), then the names are split into parts. For example, in an otherwise empty directory create one file:

$ touch "a b c"

Now, run a for loop:

$ for file in $(ls); do echo $file; done
a
b
c

The for loops is run three times even though there is only one file. That is because the file name has spaces and, after word splitting, the for loop gets three arguments: a, b, and c.

This is easily avoided. Use instead:

for file in *

The shell is smart enough to keep each file name here in tact regardless of what characters are in its name.

Recursive Searching

If we also want to search subdirectories for gzipped files, then we can use bash's globstar feature as follows:

list=/errors_exception.txt
cd /test
shopt -s globstar
while IFS= read -r pattern ; do
    for file in **/*.gz ; do
        if zcat < "$file" | grep -Fxq "$pattern"; then
            echo "$pattern found pattern in $file"
        fi
    done
done <"$list" > output

This requires bash.

John1024
  • 74,655
  • <$list not <list – jimmij Sep 03 '14 at 04:00
  • yeah thanks, i actually notice that and correct it. – nolram16 Sep 03 '14 at 04:02
  • can you explain whats wrong in my code? – nolram16 Sep 03 '14 at 04:08
  • @nolram16 I updated the answer to talk more about that. – John1024 Sep 03 '14 at 04:54
  • another question regarding this, what if the location of the log files has a sub-folder inside which contains logfiles also (.gz) – nolram16 Sep 03 '14 at 05:45
  • @nolram16 As it stands, files in subdirectories are not included. Do you want them included? All subdirectories or just specific ones? – John1024 Sep 03 '14 at 06:43
  • yes i want to include them. all subdirectories which also contain log files. – nolram16 Sep 03 '14 at 07:32
  • i tried to replace * with ls -R but it shows

    zcat:

    which causes no file or directory error

    – nolram16 Sep 03 '14 at 07:35
  • can you put an explanation about IFS= read -r ?

    thanks a lot! :)

    – nolram16 Sep 03 '14 at 07:38
  • I added a solution that looks in subdirectories. IFS= read -r helps assure that the line is read in unmangled. See Giles's answer in this question for more details: http://unix.stackexchange.com/a/18936/53604 – John1024 Sep 03 '14 at 07:48
  • it still cant read files inside the subdirectories. any thoughts? i dont have enough reputation yet. i cant bring this up to chat – nolram16 Sep 03 '14 at 08:28
  • "cant read files inside the subdirectories" Does that mean permission was denied or what? If you run shopt -s globstar; echo **/*.gz, do you see all the files listed? (The system wouldn't let me move you to chat either.) – John1024 Sep 03 '14 at 16:22