0

I am connecting to mySQL database using a shell script. After connecting to the mySQL database, I execute a query. The query gives me 300,000 URLs as the result.

For each of the 300,000 URLs I need to check if the URLs actually exist and need to update the table that the URL is checked for its existence.

I have planned on using the curl command. I am giving the command as below.

curl -s --head http://myurl/ | head -n 1 | grep "HTTP/1.[01] [23].."

If I just give the command in my shell, I am getting the response (like 301, 200 etc). However, I need it in a variable so that I can use it for some manipulation purposes. For example, like below.

$var = curl -s --head http://myurl/ | head -n 1 | grep "HTTP/1.[01] [23].."
echo $var;
if ($var == "some value")
{ 
    do something;
}
else
{
    do some other thing;
}
Ramesh
  • 39,297
  • Also maybe http://unix.stackexchange.com/questions/16024/how-can-i-assign-the-result-of-a-command-to-a-shell-variable. Complete the whole set. :) – Mikel Jan 29 '14 at 21:50

2 Answers2

4

Extracting all 300,000 URLs in one go isn't optimal. You may find it more useful to extract a small number of URLs, and check them, and then extract different URLs.

Let's say you assign status 0 to unchecked URLs. You want to change this status to 200, 301, 404, 403, 401, and so on and so forth.

Let's say each row has a unique ID, which makes things faster.

#!/bin/sh

NUM=10
RESULTS=/tmp/results.$$.$RANDOM.txt

# Select some rows, selecting ID and URL

# The URL may contain spaces or single quotes.
# I wouldn't trust double quotes; a  ` grep -v '"'  ` piped between
# the mysql --silent and the while could be advisable in this case.

echo "SELECT id, url FROM mytable WHERE status = 0 LIMIT $NUM;" \
    | mysql --silent mydatabase \
    | while [ -n "$ID"  ]; do
    read ID URL
    if [ -n "$ID" ]; then
            # Extract HTTP result code
        CODE=` curl -s --head "$URL" | head -n 1 \
                   | grep "^HTTP/[01]\.[0-9] [1-9][0-9]*" \
                   | cut -f2 -d" " `

            # If there is a code
        if [ -n "$CODE" ]; then
            if [ "$CODE" == "200" ]; then
                echo "200"
            else
                echo "Not 200"
            fi
                    # Prepare update
            echo "UPDATE mytable SET status=$CODE WHERE id=$ID;" >> $RESULTS
            else
                 # We might update this ID to a new status in order not to
                 # extract it again.
                 echo "UPDATE mytable SET status=666 WHERE id=$ID;" >> $RESULTS
        fi
    fi
done

# update database
mysql mydatabase < $RESULTS
# Remove temporary file
rm $RESULTS
LSerni
  • 4,560
3

To capture the result of a command in a variable, you can use backticks (``) or, better, $(command). In your case, that would be:

var=$(curl -s --head http://myurl/ | head -n 1 | grep "HTTP/1.[01] [23]..")
echo "$var";
if [ "$var" == "some value" ]; then    
    do something;    
else
    do some other thing;
fi
terdon
  • 242,166
  • It is not working. I use the above code in a script and when I tried to echo, I am getting "command not found" error in the first line. – Ramesh Jan 29 '14 at 20:56
  • Alright, I was having spaces in my code. Sorry for the confusion. Thanks again for the answer. – Ramesh Jan 29 '14 at 21:00
  • It is working. I had spaces in my bash script. Thanks again. I cannot accept the answer until 4 minutes. So, after that I will accept the answer. – Ramesh Jan 29 '14 at 21:03
  • @Ramesh no worries :). i had posted my previous comment before seeing your reply. – terdon Jan 29 '14 at 21:06