1

This question arose from another question I had here ("How to extract basename of parent directory in shell"), which seems to have opened the "rabbit hole" down the Unix string manipulations for me. So, here goes supplementary question:

What is the correct way to extract various parts ("levels") from dirname results combined with find?

Let's assume I have the following hierarchy:

DE_AT/adventure/motovun/300x250/A2_300x250.zip

I "find" the file like so:

find . -name "*.zip" 

execute shell on the findresults:

-exec sh -c '' {} \;

How would I extract each part of the full path? How do I get:

  • DE_AT
  • adventure
  • motovun
  • 300x250
  • A2_300x250.zip

This is what I know so far:

basename "$1" # gets me: A2_300x250.zip
dirname "$1"  # gets me: ./DE_AT/adventure/motovun/300x250

I am asking this because I need to rename this .zip files into someString_DE_AT_motovun+A2_300x250.zip.

I came up with a horrible frankensolution like so:

find . -name "*.zip" -exec sh -c '
    mv "$0" "myString_$(basename $(dirname $(dirname \
    $(dirname "$0")_...+$(basename "$0")"
' {} \;

I don't even wish to try this because this simply cannot be correct.

2 Answers2

4

You can use the split+glob operator:

find . -name '*.zip' -exec sh -c '
   IFS=/ # split on /
   set -f # disable glob
   for file do
     set -- $file # invoke split+glob, store in positional parameters
     # now the path components are in $1, $2...
     mv -i -- "$file" "someString_${2}_${4}+${6}"
   done' sh {} +

$1 would have ., $2 DE_AT and so on. To get the last argument, it becomes tricky, as you need something like:

eval "last=\${$#}"

It may be easier to use a different shell like zsh which has proper split operators and arrays for that:

find . -name '*.zip' -exec zsh -c '
   for file do
     components=(${(s:/:)file})
     printf "Last component: %s\n" $components[-1]
     mv -i -- "$file" "someString_$components[2]_$components[-3]+$components[-1]"
   done' zsh {} +

With zsh, you can also use its zmv batch-renaming tool:

autoload zmv # best in ~/.zshrc
zmv -n '([^/]#)/**/(*)/*/(*.zip)' 'someString_${1}_${2}+$3'

The **/ part matches any level (including 0) of subdirectories, so it will match on (a)/b/c/(d)/e/(f.zip) or (a)/(b)/c/(d.zip) with the captured strings (a/d/f.zip, a/b/d.zip) going in $1/$2/$3 for the replacement so as to get a similar behaviour as for the $components array approach above.

The [^/]# part where # is like the regexp * operator, matches any sequence of non-/. For globs, * works the same as * cannot go across a /, but after expanding the glob, zmv uses pattern matching on the resulting files to extract the parts for the replacement, and there, * would go a across a / so (*) in place of ([^/]#) would match too much.

  • Stephane, can you please elaborate a bit on the zshsolution? I am trying to get man zmv but I have no entries :(. What is the -n switch and why $1, $2, $3 when in first one you are using 2, 4 and 6? Sorry for being a total noob here :/ – Alexander Starbuck Aug 31 '16 at 11:51
  • 1
    @AlexStarbuck see edit. I already gave the link to the zmv doc in my answer to your other question. For zsh like for any biggish manual like bash's, I'd use info instead of man. If you do info zsh, type i to get the index, enter zmv (completion abailable), it should take you to the zmv documentation. – Stéphane Chazelas Aug 31 '16 at 12:09
  • If I pressi while in info zsh I get the message: "No indices available". – Alexander Starbuck Aug 31 '16 at 12:20
  • 1
    @AlexStarbuck, you probably don't have the info pages for zsh installed on your system, so info zsh only gives you a dump of the man page. If on a Debian-like system, you may need to install the zsh-doc package. – Stéphane Chazelas Aug 31 '16 at 12:22
  • Stephane, your zsh solution works brilliantly :) but also moves the renamed .zip files to top of the hierarchy (to a containing folder banners/, which holds /DE_AT/, /DE_DE/ and /DE_CH/ folders; how can I not move them?) – Alexander Starbuck Aug 31 '16 at 12:30
  • @AlexStarbuck, all 3 solutions do that as that's what your question implied you wanted. If you want the file to stay in the same directory, you can use zmv '...' '${f:h}/someString...' where $f is the original file, and ${f:h} its head (dirname). (use ${file%/*} for the other solutions) – Stéphane Chazelas Aug 31 '16 at 12:39
0

Is using only find's exec a strict requirement ? I'd rather loop on find results and combine it with a string-manipulation-friendly tool like awk :

for ii in $(find . -name "*.zip")
do
    mv $ii $(echo $ii|awk -F/ '{print "someString_" $2 "_" $4 "+" $6}')
done

(Replace mv by echo mv for testing purposes.)

NB : -F/ option of awk sets / as the separator instead of whitespaces and tabulations.

Update

As suggested in comments by Stéphane, it would probably be wiser and more robust to tune the split+glob operator (more information about it here) beforehand :

IFS=$'\n'
set -f

The former line is mandatory anyway if your filenames contains spaces, and the second line if your filenames contains wildcards.

Don't forget to switch them to previous settings afterwards if you don't want to tear your hair out because of "strange" behaviour later… Assuming you haven't customized these settings :

unset IFS
set +f
  • It is not a requirement at all :), it's just something I learned so far. I find the existing shell scripting tutorial quite awful for newbies. – Alexander Starbuck Aug 30 '16 at 15:15
  • 1
    Here, you're using the split+glob operator on the output of find, but are not tuning it properly. You'd want to set IFS to newline and disable the glob part. There's no reason you'd want to invoke it on $ii – Stéphane Chazelas Aug 30 '16 at 15:34
  • @StéphaneChazelas I just learned a number of things reading your answer here, thanks ! I feel it's not worth tuning IFS and the globbing if filenames are "standard", but it's clearly something to be aware of. – Skippy le Grand Gourou Aug 30 '16 at 19:51
  • 1
    @AlexStarbuck Here is the result : mv ./DE_AT/adventure/motovun/300x250/A2_300x250.zip someString_DE_AT_motovun+A2_300x250.zip. I tell awk (option -F/) to split the string using / (by default it splits on blanks). $i refers to the ith field ($0 is the whole string). Since find prefixes its output with the directory where it searches, which we set to ., the first field ($1) is . — therefore $2=DE_DE. – Skippy le Grand Gourou Aug 31 '16 at 13:12