2

I'm trying to create a script that will check a website for a word. I have a few to check so I'm trying to input them via another file.

The file is called "testurls". In the file I list the keyword then the URL. They are separated with a semicolon.

Example Domains;www.example.com
Google;www.google.com

Here is the script:

#!/bin/bash
clear

# Call list of keywords and urls
DATA=`cat testurls`

for keyurl in $DATA
do
    keyword=`awk -F ";" '{print $1}' $keyurl`
    url=`awk -F ";" '{print $2}' $keyurl`
    curl -silent $url | grep '$keyword' > /dev/null
 if [ $? != 0 ]; then
    # Fail
        echo "Did not find $keyword on $url"
    else
    # Pass
        echo $url "Okay"
fi
done

The output is:

awk: cannot open Example (No such file or directory)
awk: cannot open Example (No such file or directory)
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
Did not find  on
awk: cannot open Domains;www.example.com (No such file or directory)
awk: cannot open Domains;www.example.com (No such file or directory)
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
Did not find  on
awk: cannot open Google;www.google.com (No such file or directory)
awk: cannot open Google;www.google.com (No such file or directory)
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
Did not find  on

I've hacked away at this for ages now. Any help is very welcome.

3 Answers3

6

There are several problems with your script. I've listed the ones I found, but I haven't tested, there may be others.

for keyurl in $DATA; do … splits $DATA at each whitespace, not at each newline. So in the first iteration, $DATA will be Example; then Domains;www.example.com, and so on. Furthermore, each value undergoes wildcard expansion, so if there is a * in a keyword, you might see funky results depending on the files present in the current directory.

You're tring to process newline-separated data. A simple way is

while read -r keyurl; do
  …
done <testurls

This strips the indentation from each line, which is probably not a bad thing here. (Use IFS= read -r keyurl if you want keyurl to contain each line exactly.)

Your calls to awk aren't working because you're passing $keyurl as a file name. You need to pass it as input instead. While you're at it, always use double quotes around variable substitutions (otherwise the shell performs some expansions on their value). I also recommend using $(…) instead of `…`; they're the same, exept that `…` is difficult to use when you want to quote things inside, whereas the syntax of $(…) is intuitive.

keyword=`echo "$keyurl" | awk -F ";" '{print $1}'`
url=`echo "$keyurl" | awk -F ";" '{print $2}'`

There's a better way to split a variable at the first semicolon: use the shell's built-in constructs to strip a prefix or suffix from a string.

keyword=${keyurl%%;*} url=${keyurl#*;}

But since your data comes from the read built-in and the separator is a single character, you can take advantage of the IFS feature and directly split your input as you read it.

while IFS=';' read -r keyword url; do …

Coming to your curl and grep calls, note that you're looking for the literal text $keyword, since you used single quotes. Use double quotes; note that the keyword will be interpreted as a basic regular expression. If you want the keyword to be interpreted as a literal string, pass the -F option to grep. You should also put -e before the pattern, in case the keyword begins with the character - (otherwise the keyword would be interpreted as an option to grep). Finally on the topic of grep, its -q option is equivalent to >/dev/null. Also remember the double quotes around $url.

curl -silent "$url" | grep -Fqe "$keyword"

You can shorten the if [ $? != 0 ]; then part by putting the command directly in there.

if curl -silent "$url" | grep -Fqe "$keyword"; then

In summary;

while IFS=';' read -r keyword url; do
  if curl -silent "$url" | grep -Fqe "$keyword"; then
    echo "Did not find $keyword on $url"
  else
    echo $url "Okay"
  fi
done
  • You really should consider collecting some of your answers and publishing them in the one place. You could call it Unix: breakfast of champignons – jasonwryan Nov 25 '11 at 00:02
  • I absolutely agree. It would be an amazing guide for beginners like me. :) +1 (Too bad I can't vote it as THE best answer) – jaypal singh Nov 25 '11 at 00:09
  • +1 I was not expecting an answer like this! Thank you for taking the time to show me what I was doing wrong and explaining how to do it right (and in a good level of detail!). – jetgerbil Nov 25 '11 at 10:40
1

awk is considering value of $keyurl as data file to be processed. You need to feed value of $keyurl to awk like

keyword=`echo $keyurl | awk -F ";" '{print $1}'`

This will solve your one of your many problems.

0

If the format of testurls is consistent, you could use a simpler approach:

#!/bin/bash
while read -r line; do
    keyword="${line%;*}"
    url="${line#*;}"
    curl -silent "$url" | grep "$keyword" >/dev/null
    [ $? = 0 ] && echo "${keyword} found" || echo "Fail..."
done < testurls
jasonwryan
  • 73,126