This does what you want, and is also deliberately more capable than it strictly needs to be.
Because you said you were a student, I wanted to not only answer your question but also create a fairly simple example of how to use getopts
to process command-line options and arguments...and also how a little bit more work with options can extend the basic functionality to add some useful features.
The -e
, -v
, -i
, -H
, and -h
options are the same as used in grep
and some other common tools, so users will benefit from their existing knowledge and don't have to learn new and incompatible options.
To speed up multiple searches of the same .zip files, the script also caches the output of unzip -v
for each file (in /var/tmp/
by default). Command-line options -c
and -C
can be used to clear the cache files either before or after (or both) searching.
Finally, I have used double-quotes around ALL usage of variables except in the specific cases where double-quoting could cause problems - i.e. when they hold optional arguments to the grep
command - unquoted, they add nothing to the args that will be passed to grep
, but if they were double-quoted, they would add the empty string to those args. This is an example of one of the very few cases where you shouldn't double-quote your variables. In all other cases, use double-quotes.
Note: as pointed out by G-Man, the only reason why it is reasonably safe to use $IGNORECASE
unquoted like this is because I explicitly set it to a known and safe value (i.e. without spaces or asterisks or other problematic characters) before I used it, so I know for a fact that it can not hold any other value. This certain knowledge allowed me to be lazy about quoting in this particular case.
It would, however, be safer to use ${IGNORECASE:+"$IGNORECASE"}
, especially if it might contain an unknown, arbitrary value (e.g. assigned from the command line rather than hard-coded in the script).
BTW, ${varname:+"$varname"}
returns either absolutely nothing (not even the empty string) if $varname
is empty OR the double-quoted value of $varname
if it's not empty.
Use the script like this:
$ ./searchzip.sh -h -e Tom file*.zip
113 Defl:N 64 43% 2016-05-29 15:45 cf747915 a/Tom.txt
113 Defl:N 64 43% 2016-05-29 15:45 cf747915 tomato/Tom.txt
or:
$ ./searchzip.sh -i -e Tom file*.zip
file1.zip: 113 Defl:N 64 43% 2016-05-29 15:45 cf747915 a/Tom.txt
file2.zip: 113 Defl:N 64 43% 2016-05-29 15:45 cf747915 b/tom.txt
file3.zip: 113 Defl:N 64 43% 2016-05-29 15:45 cf747915 c/tom3.txt
file4.zip: 0 Stored 0 0% 2016-05-29 15:50 00000000 tomato/
file4.zip: 113 Defl:N 64 43% 2016-05-29 15:45 cf747915 tomato/Tom.txt
or:
$ ./searchzip.sh -i -e Tom file*.zip | awk -F: '{print $1}' | sort -u
file1.zip
file2.zip
file3.zip
file4.zip
Anyway, here's the script:
#!/bin/bash
#set -x
# 1. define usage() function to print help
usage() {
[ -n "$*" ] && echo "$@" $'\n' > /dev/stderr
cat > /dev/stderr <<__EOF__
Usage: $0 [-HhicC] [-d cachedir ] [-e PATTERN] [ -v PATTERN ] zipfile...
-e Pattern to search for
-v Pattern to exclude from search
-i Ignore case when searching
-H Include .zip filenames in output (default)
-h Suppress .zip filenames in output
-d Directory to use for temporary listing files (default /var/tmp)
-c Delete cache files before searching
-C Delete cache files after searching
-h This help message
Either -e or -v may be specified multiple times
__EOF__
exit 1;
}
# 2. set some defaults
CLEANUP=0
CLEAR=0
IGNORECASE=''
FNAMES='-H'
EXCL=''
pattern=''
exclude=''
cache_dir="/var/tmp"
# 3. process command-line options
while getopts ":s:e:v:d:CchHi" opt; do
case "$opt" in
s|e) pattern+="$OPTARG|" ;; # -s is an undocumented alias for -e
v) exclude+="$OPTARG|" ;;
d) cache_dir="$OPTARG" ;;
C) CLEANUP='1' ;;
c) CLEAR='1' ;;
h) FNAMES='-h' ;;
H) FNAMES='-H' ;;
i) IGNORECASE='-i' ;;
*) usage ;;
esac
done
shift $((OPTIND-1))
# 4. check and post-process options and their args
[ -z "$pattern" ] && usage 'ERROR: -e option is required'
# remove trailing '|' from $pattern and $exclude
pattern="${pattern%|}"
exclude="${exclude%|}"
# 5. the main loop of the program that does all the work
for f in "$@" ; do
if [ -e "$f" ] ; then
cache_file="$cache_dir/$f.list"
search_file="$cache_file.search"
[ "$CLEAR" -eq 1 ] && rm -f "$cache_file"
if [ ! -e "$cache_file" ] ; then
unzip -v "$f" > "$cache_file"
fi
grep "$FNAMES" $IGNORECASE -E "$pattern" "$cache_file" > "$search_file"
# safer to use ${IGNORECASE:+"$IGNORECASE"}
if [ -z "$exclude" ] ; then
sed -e "s/^.*$f[^:]*:/$f:/" "$search_file"
else
sed -e "s/^.*$f[^:]*:/$f:/" "$search_file" |
grep $IGNORECASE -v -E "$exclude"
# or use ${IGNORECASE:+"$IGNORECASE"}
fi
rm -f "$search_file"
[ "$CLEANUP" -eq 1 ] && rm -f "$cache_file"
fi
done
The basic structure of the program is:
define a usage()
function to print a help message (with optional error message)
define defaults for some variables
process the command line options
perform any sanity-checking and post-processing required on those options and their args
Finally, the main program loop which does all the work.
This is a very common and very simple structure which you can use in many programs.
BTW, I haven't put any comments in the main loop. I felt they would be redundant as I used meaningful variable names so comments would only be trivial paraphrases of the code, like "# do foo" before doing 'foo'. If and when necessary, I would have made comments wherever i felt the code wasn't self-explanatory.
for z in *.zip; do while IFS=$'\n' read -r a; do unzip -p "$z" "$a" | grep "4" >/dev/null && printf "%s : %s\n" "$z" "$a" ; done< <( zipinfo -1 "$z" ); done
– Runium May 29 '16 at 03:59