2

For my school project i have to create a script that allows you to search the contents of files that are packed in a zip file. You can give a "search string" with the script, followed by one or more zipfiles, as follows:

./searchZip.sh -s Tom ztest1.zip ztest2.zip
 Found the word 'Tom' in the following files:
  ztest1.zip : script1_q0638730_04-18-23-04-41.txt
  ztest2.zip : script2_q0638730-04-25-19-52-07.txt

I tried it but I don't know how to give a second parameter, can anyone help me please? Thank you! here is my code:

function unzipFile()
{   
    unzip ztest1.zip -d  zipFiles
    unzip ztest2.zip -d zipFiles
    unzip ztest3.zip -d  zipFiles

}


if test -z "$1" 
then
    echo "Enter a name please "
    exit

else
    unzipFile
         echo "Found the word '$1' in the following files:"
        grep -ilR "$1" zipFiles/

fi
rm -r zipFiles/
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
carlos
  • 21
  • Welcome to Unix & Linux.  If you and Carlos Chu are the same person, it would appear that you have accidentally created two accounts.  You should use the contact form and select “I need to merge user profiles” to have your accounts merged.  To merge them, you will need to provide links to the two accounts.  For your information, these are http://unix.stackexchange.com/users/172526/carlos and http://unix.stackexchange.com/users/172567/carlos-chu.  You’ll then be able to [edit], comment on, and accept answers to this question (and I believe that you’ll be able to comment on answers). – G-Man Says 'Reinstate Monica' May 29 '16 at 00:20
  • A start? for z in *.zip; do while IFS=$'\n' read -r a; do unzip -p "$z" "$a" | grep "4" >/dev/null && printf "%s : %s\n" "$z" "$a" ; done< <( zipinfo -1 "$z" ); done – Runium May 29 '16 at 03:59

2 Answers2

5

This does what you want, and is also deliberately more capable than it strictly needs to be.

Because you said you were a student, I wanted to not only answer your question but also create a fairly simple example of how to use getoptsto process command-line options and arguments...and also how a little bit more work with options can extend the basic functionality to add some useful features.

The -e, -v, -i, -H, and -h options are the same as used in grep and some other common tools, so users will benefit from their existing knowledge and don't have to learn new and incompatible options.

To speed up multiple searches of the same .zip files, the script also caches the output of unzip -v for each file (in /var/tmp/ by default). Command-line options -c and -C can be used to clear the cache files either before or after (or both) searching.

Finally, I have used double-quotes around ALL usage of variables except in the specific cases where double-quoting could cause problems - i.e. when they hold optional arguments to the grep command - unquoted, they add nothing to the args that will be passed to grep, but if they were double-quoted, they would add the empty string to those args. This is an example of one of the very few cases where you shouldn't double-quote your variables. In all other cases, use double-quotes.

Note: as pointed out by G-Man, the only reason why it is reasonably safe to use $IGNORECASE unquoted like this is because I explicitly set it to a known and safe value (i.e. without spaces or asterisks or other problematic characters) before I used it, so I know for a fact that it can not hold any other value. This certain knowledge allowed me to be lazy about quoting in this particular case.

It would, however, be safer to use ${IGNORECASE:+"$IGNORECASE"}, especially if it might contain an unknown, arbitrary value (e.g. assigned from the command line rather than hard-coded in the script).

BTW, ${varname:+"$varname"} returns either absolutely nothing (not even the empty string) if $varname is empty OR the double-quoted value of $varname if it's not empty.

Use the script like this:

$ ./searchzip.sh -h -e Tom file*.zip
     113  Defl:N       64  43% 2016-05-29 15:45 cf747915  a/Tom.txt
     113  Defl:N       64  43% 2016-05-29 15:45 cf747915  tomato/Tom.txt

or:

$ ./searchzip.sh -i -e Tom file*.zip
file1.zip:     113  Defl:N   64  43% 2016-05-29 15:45 cf747915  a/Tom.txt
file2.zip:     113  Defl:N   64  43% 2016-05-29 15:45 cf747915  b/tom.txt
file3.zip:     113  Defl:N   64  43% 2016-05-29 15:45 cf747915  c/tom3.txt
file4.zip:       0  Stored    0   0% 2016-05-29 15:50 00000000  tomato/
file4.zip:     113  Defl:N   64  43% 2016-05-29 15:45 cf747915  tomato/Tom.txt

or:

$ ./searchzip.sh -i -e Tom file*.zip | awk -F: '{print $1}' | sort -u
file1.zip
file2.zip
file3.zip
file4.zip

Anyway, here's the script:

#!/bin/bash

#set -x

# 1. define usage() function to print help
usage() { 

[ -n "$*" ] && echo "$@" $'\n' > /dev/stderr

cat > /dev/stderr <<__EOF__
Usage: $0 [-HhicC] [-d cachedir ] [-e PATTERN] [ -v PATTERN ]  zipfile...

-e   Pattern to search for
-v   Pattern to exclude from search
-i   Ignore case when searching
-H   Include .zip filenames in output (default)
-h   Suppress .zip filenames in output

-d   Directory to use for temporary listing files (default /var/tmp)
-c   Delete cache files before searching
-C   Delete cache files after searching

-h   This help message

Either -e or -v may be specified multiple times
__EOF__

exit 1;
}

# 2. set some defaults
CLEANUP=0
CLEAR=0
IGNORECASE=''
FNAMES='-H'
EXCL=''
pattern=''
exclude=''
cache_dir="/var/tmp"

# 3. process command-line options
while getopts ":s:e:v:d:CchHi" opt; do
    case "$opt" in
        s|e) pattern+="$OPTARG|" ;;  # -s is an undocumented alias for -e
          v) exclude+="$OPTARG|" ;;
          d) cache_dir="$OPTARG" ;;
          C) CLEANUP='1' ;;
          c) CLEAR='1' ;;
          h) FNAMES='-h' ;;
          H) FNAMES='-H' ;;
          i) IGNORECASE='-i' ;;
          *) usage ;;
    esac
done
shift $((OPTIND-1))

# 4. check and post-process options and their args
[ -z "$pattern" ] && usage 'ERROR: -e option is required' 

# remove trailing '|' from $pattern and $exclude
pattern="${pattern%|}"
exclude="${exclude%|}"

# 5. the main loop of the program that does all the work
for f in "$@" ; do
  if [ -e "$f" ] ; then
    cache_file="$cache_dir/$f.list"
    search_file="$cache_file.search"

    [ "$CLEAR" -eq 1 ] && rm -f "$cache_file"

    if [ ! -e "$cache_file" ] ; then
      unzip -v "$f" > "$cache_file"
    fi

    grep "$FNAMES" $IGNORECASE -E "$pattern" "$cache_file" > "$search_file"
    # safer to use ${IGNORECASE:+"$IGNORECASE"}

    if [ -z "$exclude" ] ; then
        sed -e "s/^.*$f[^:]*:/$f:/" "$search_file"
    else
        sed -e "s/^.*$f[^:]*:/$f:/" "$search_file" | 
          grep $IGNORECASE -v -E "$exclude" 
          # or use ${IGNORECASE:+"$IGNORECASE"}
    fi
    rm -f "$search_file"

    [ "$CLEANUP" -eq 1 ] && rm -f "$cache_file"
  fi
done

The basic structure of the program is:

  1. define a usage() function to print a help message (with optional error message)

  2. define defaults for some variables

  3. process the command line options

  4. perform any sanity-checking and post-processing required on those options and their args

  5. Finally, the main program loop which does all the work.

This is a very common and very simple structure which you can use in many programs.

BTW, I haven't put any comments in the main loop. I felt they would be redundant as I used meaningful variable names so comments would only be trivial paraphrases of the code, like "# do foo" before doing 'foo'. If and when necessary, I would have made comments wherever i felt the code wasn't self-explanatory.

cas
  • 78,579
  • Thanks for joining me in banging on the “quote your variables” drum.  You might want to further explain the reason why not quoting $FNAMES and $IGNORECASE is reasonably safe is that your script sets them before it uses them, and you know they cannot contain spaces or special characters like *.  If you check our reference work on the subject, though, you may see that even this is not 100% safe.  It would be safer (although much less clear) to say ${FNAMES:+"$FNAMES"} and ${IGNORECASE:+"$IGNORECASE"}. … (Cont’d) – G-Man Says 'Reinstate Monica' May 29 '16 at 05:58
  • (Cont’d) …  Or you could make them array variables (IGNORECASE=() vs. IGNORECASE==('-i')) and then use "${IGNORECASE[@]}". … … … … P.S. Since FNAMES cannot be null, you might as well just quote "$FNAMES". – G-Man Says 'Reinstate Monica' May 29 '16 at 06:01
  • Hey, that's my drum!! (but you can share it). Good point, though, about how to double-quote variables-as-cmd-line-args safely even if you don't know exactly what their contents are going to be. – cas May 29 '16 at 06:03
  • @G-man. answer updated. you should add that point about ${varname:+"$varname"} to SC's answer...it would definitely add something useful to a reference answer. – cas May 29 '16 at 06:33
  • It occurred to me overnight to say that we all know that it's Stéphane's drum; the rest of us are just keeping it warm. Thanks for the suggestion; I posted a new answer here, linking back to your answer, here. – G-Man Says 'Reinstate Monica' May 30 '16 at 01:07
1

here is a primitive solution :

#!/bin/bash 
if [[ "$#" -le 0 ]]; then
    echo "Usage : ./searchZip.sh -s Tom ztest1.zip ztest2.zip"
    exit 0
fi

case $1 in
    -s) str="$2"
        shift 2
        for i in "$@"; do
            echo "searching for $str in $i ... "
            if ( unzip -c "$i" | grep "$str" 1>/dev/null ); then  
                unzip "$i" -d ./tmp > /dev/null
                grep -rl "$str" ./tmp
                rm -r ./tmp
            fi  
        done;;
    *) echo "Usage ... " 
        ;;
esac

please feel free to ask me about it by comment so I can improve it.

  • 1
    You should always quote your shell variable references (e.g., "$i", "$str", "$1", "$2", and even "$#" and "$@") unless you have a good reason not to, and you’re sure you know what you’re doing.  In particular, you should always always always** quote "$@".  You can abbreviate for i in "$@"; do to for i do.  Oh, and, by the way: it’s conventional to exit 1 (or possibly higher) on any sort of error (including invalid arguments), and exit 0 if the script completes successfully.  (And it would be good form to display an error message if "$1" ≠ -s.) – G-Man Says 'Reinstate Monica' May 29 '16 at 00:50
  • @G-Man ain’t that the truth :) – njboot May 29 '16 at 02:47
  • @G-Man so there is a risk of injecting some code, when I don't quote the variables (specially $@ ) , right ? I updated the snippet code ... –  May 29 '16 at 07:13
  • @Ojiryx: I don't see a threat of code injection. (I'm reluctant to say that there isn't one, since these things can be tricky, and I sometimes forget what to look for.) But the obvious weakness is that it can't handle filenames with spaces (and other special characters) in them. For example, you can create an archive called My files.zip with a command like zip "My files.zip" *.txt; you can extract it with unzip "My files.zip", check its attributes with ls -l "My files.zip", etc. … (Cont’d) – G-Man Says 'Reinstate Monica' May 29 '16 at 16:06
  • (Cont’d) …  But, if you run ./searchZip.sh -s Tom "My files.zip", and your script says for i in $@, then it will act as if you had typed ./searchZip.sh -s Tom My files.zip (without the quotes), and it will look for two files, one called My and one called files.zip.  There are similar problems if you have a file whose name contains *, ?, or something like that.  See Implications of failing to quote a variable in bash/POSIX shells for more information. – G-Man Says 'Reinstate Monica' May 29 '16 at 16:08