0

Big fan of stackoverflow. Am a beginner myself and have found a lot of help on this site but have run into problems now.

Today I have a function like the following.

I read a text file (data.txt) for each new line written to it. If the text line contains any word that is included in the Array "pets", it writes that line into another text file pets.txt but ignores the other lines.

How do I invert that function?

I want to be able to block bad words with an Array(badword) so that these are not written to the file petslist.log

pets.filter contains

pets=(
'Dog'
'Cat'
'Mouse'
'Horse'
)

badword.filter contains

badword=(
'Stupid'
'Dumb'
'Bad'
)

script.sh contains

#!/bin/bash
source /home/pi/pets.filter
source /home/pi/badword.filter

while IFS='' read -r line do while [ "${pets[count]}" != "" ] do if [ "${line/${pets[count]}}" != "$line" ] ; then echo "$line" >> /logs/petslist.log fi count=$(( $count + 1 )) done

2 Answers2

0

If badwords is an array of actually words, then you might want so use grep -w:

-w, --word-regexp

Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore. This option has no effect if -x is also specified.

So in your case

# Declare some constants
readonly bad_words_list="stupid dumb bad" \
         out_file="out_file" \
         in_file="in_file"

The function you want

function filter_bad_words() { # Loop for reading line-by-line while read -r line do # Loop through the list # Notice that there are no quotes for bad_word in ${bad_words_list[@]} do # Check if there is a bad word # Options in grep: quiet, ignore case, word if grep -qiw "$bad_word" <<< "$line" then # Print the line with bad word to stderr echo "Line contains bad word: $line" 1>&2

            # Exit from this loop, continue the main one
            continue 2
        fi
    done

    # Save line into the out file
    # This will not be called if line contains bad word
    echo &quot;$line&quot; &gt;&gt; &quot;$out_file&quot;

# Read from file
done &lt; &quot;$in_file&quot;

}

Not sure if this is the most efficient solution (might be also possible with sed or awk), but at least this works and is pure Bash with grep only

Edit: if you just want to filter these words without other kind of processing, you can also use grep -o as here:

# Read file into a variable
filtered="$(< "$in_file")"

Go through each bad word

for word in ${bad_words_list[@]} do # Filter the word filtered="$(grep -iv "$word" <<< "$filtered")" done

Save final result

echo "$filtered" > "$out_file"

  • given that grep is exactly a tool that operates a test condition on each line of a stream, why by God do you kill that capability and make one call for one line ? – Thibault LE PAUL May 27 '23 at 16:43
  • Somehow I probably forgot the easiest grep -o solution and concentrated on the line-by-line processing as in the original code (maybe OP wants to do something more than just filtering?). I'll edit the answer – xezo360hye May 28 '23 at 08:10
0

You're overcomplicating things (and should really not use a shell loop to process text)

pets='Dog
Cat
Mouse
Horse'

badword='Stupid Dumb Bad'

grep -Fe "$pets" < input.txt > pets.txt grep -vFe "$badword" < input.txt > input-without-badword.txt

Or combining the two:

grep -Fe "$pets" < input.txt |
  grep -vFe "$badword" > pets-without-badword.txt

grep accepts multiple lines as the pattern (or Fixed strings with -F) in which case it looks for any of those line in the input.

If you have to use an array instead of a multi-line string, you can do:

# fish / rc / zsh -o rcexpandparam
grep -F -e$array < input > output

zsh

grep -F -e$^array < input > output

mksh / bash / zsh

grep -F "${array[@]/#/-e}" < input > output

ksh93

grep -F "${array[@]/*/-e\0}" < input > output

Though in mksh / ksh93 / zsh / bash, you can also join the elements of the array with newline with:

IFS=$'\n'
grep -Fe "${array[*]}" < input > output

Or in zsh:

grep -Fe ${(pj[\n])array} < input > output