Bash - Looping through Array in Nested [FOR, WHILE, IF] statements

Question

I am trying to process a large file-set, appending specific lines into the "test_result.txt" file - I achieved it -not very elegantly- with the following code.

for i in *merged; do
        while read -r lo; do
                if [[ $lo == *"ID"* ]]; then
                echo $lo >> test_result.txt
                fi
                if [[ $lo == *"Instance"* ]]; then
                echo $lo >> test_result.txt
                fi
                if [[ $lo == *"NOT"* ]]; then
                echo $lo >> test_result.txt
                fi
                if [[ $lo == *"AI"* ]]; then
                echo $lo >> test_result.txt
                fi
                if [[ $lo == *"Sitting"* ]]; then
                echo $lo >> test_result.txt

        done < $i
done

However, I am trying to size-it-down using an array - which resulted in quite an unsuccessful attempt.

KEYWORDS=("ID" "Instance" "NOT" "AI" "Sitting" )
KEY_COUNT=0

for i in *merged; do
        while read -r lo; do
                if [[$lo == ${KEYWORDS[@]} ]]; then
                echo $lo >> ~/Desktop/test_result.txt && KEY_COUNT="`expr $KEY_COUNT + 1`"
                fi
        done < $i
done

How large is the file set? This sounds like an XY problem that could be better accomplished by a straightforward grep command. — steeldriver, Apr 10 '19 at 10:17
Small side note: Instead of KEY_COUNT="`expr $KEY_COUNT + 1`" you could also write ((KEY_COUNT++)) — Freddy, Apr 10 '19 at 10:34

Kusalananda · Accepted Answer · 2019-04-10T14:39:50.167

It looks like you want to get all the lines that contains at least one out of a set of words, from a set of files.

Assuming that you don't have many thousands of files, you could do that with a single grep command:

grep -wE '(ID|Instance|NOT|AI|Sitting)' ./*merged >outputfile

This would extract the lines matching any of the words listed in the pattern from the files whose names matches *merged.

The -w with grep ensures that the given strings are not matched as substrings (i.e. NOT will not be matched in NOTICE). The -E option enables the alternation with | in the pattern.

Add the -h option to the command if you don't want the names of the files containing matching lines in the output.

If you do have many thousands of files, the above command may fail due to expanding to a too long command line. In that case, you may want to do something like

for file in ./*merged; do
    grep -wE '(ID|Instance|NOT|AI|Sitting)' "$file"
done >outputfile

which would run the grep command once on each file, or,

find . -maxdepth 1 -type f -name '*merged' \
    -exec grep -wE '(ID|Instance|NOT|AI|Sitting)' {} + >outputfile

which would do as few invocations of grep as possible with as many files as possible at once.

Why is using a shell loop to process text considered bad practice?

It is indeed a file-set of a few thousand. Originally, I built other processes into the loop but running grep separately - before the extra tweakings - it's a cleaner solution. Just needed to add the "-h" option to suppress default prefixes - Thnks. — madArch, Apr 10 '19 at 16:45
@AF.BJ since this answer solved your problem, consider accepting it: What should I do when someone answers my question? — muru, Apr 11 '19 at 01:15

score 3 · Answer 2 · answered Apr 10 '19 at 10:20

Adding an array doesn't particularly help: you still would need to loop over the elements of the array (see How do I test if an item is in a bash array?):

while read -r lo; do
    for keyword in "${keywords[@]}"; do
        if [[ $lo == *$keyword* ]]; then
            echo $lo >> ~/Desktop/test_result.txt && KEY_COUNT="`expr $KEY_COUNT + 1`"
        fi
    done
done < "$i"

It might be better to use a case statement:

while read -r lo; do
    case $lo in
    *(ID|Instance|NOT|AI|Sitting)*)
        echo "$lo" >> ~/Desktop/test_result.txt && KEY_COUNT="`expr $KEY_COUNT + 1`"
        ;;
    esac
done < "$i"

(I assume you do further processing of these lines within the loop. If not, grep or awk could do this more efficiently.)

Bash - Looping through Array in Nested [FOR, WHILE, IF] statements

2 Answers2