6

I'm fairly new to bash; I can just about perform simple administrative tasks with simple commands 1 at a time. However, I've been tasked with renaming some files in a directory using a text file as the source for my renaming and would really appreciate a few pointers, as I am well out of my depth.

Let me explain:

New File Name.xlsx 0.1  000011F4.dat 
New File Name.xlsx 0.2  000011F5.dat 
New File Name.xlsx 0.3  000011F6.dat 
New File Name.xlsx 0.4  000011F7.dat 
New File Name.xlsx 0.5  000011F8.dat 
New File Name.xlsx 0.6  000011F9.dat 

The source text file I have resembles the above somewhat. The intention is that the first 'column' is the new name for the file, the middle is the version and the third is the current filename.

I need to rename the .dat files in the directory, changing them to the names presented in the first column. I also need to prepend the version number 0.1, 0.2 etc... to the beginning of each file.

I have a few questions:

Is it a massive problem that the files have whitespace in them? Would it be better adding " " around each file string?

Basically I have no idea where to start and any help would be massively appreciated. As you can see it's slightly more complex than a usual renaming, giving the need to add the version column to the beginning of the filename and the whitespace in the list.

Anthon
  • 79,293
  • 2
    It isn't an insurmountable problem that the filenames have spaces in them, but it does rule out using many simple approaches. Without spaces this would be pretty trivial with awk or cut, but with the spaces you have to go with uglier, longer commands as in the answers given. – evilsoup Aug 08 '13 at 21:04

4 Answers4

7

This ought to work:

sh <(sed -r 's/^\s*(.*)\s+([0-9\.]+)\s+([0-9A-Z]{8}\.dat)\s*$/mv -iv \3 "\2 \1"/' files)

... where files is the name of your source file.

What this does is pass the result of the sed command to a new instance of sh (the shell), using process substitution. The output of the sed command is:

mv -iv 000011F4.dat "0.1 New File Name.xlsx"
mv -iv 000011F5.dat "0.2 New File Name.xlsx"
mv -iv 000011F6.dat "0.3 New File Name.xlsx"
mv -iv 000011F7.dat "0.4 New File Name.xlsx"
mv -iv 000011F8.dat "0.5 New File Name.xlsx"
mv -iv 000011F9.dat "0.6 New File Name.xlsx"

Taking the sed command apart, it searches for a pattern:

  • ^ - the beginning of the line
  • \s* - any whitespace at the start
  • (.*) - any characters (the parentheses store the result to \1)
  • \s+ - at least one whitespace character
  • ([0-9\.]+) - at least one of 0-9 and . (stored to \2)
  • \s+ - at least one whitespace character
  • ([0-9A-Z]{8}\.dat) - 8 characters in 0-9 or A-Z, followed by .dat (stored to \3)
  • \s* - any whitespace at the end
  • $ - the end of the line

... and replaces it with mv -iv \3 "\2 \1", where \1 to \3 are the previously stored values. You can use something other than a space between the version number and the rest of the filename, if you like.

Here's the result:

$ ls -l
total 60
-rw-rw-r-- 1 z z   0 Aug  8 14:15 000011F4.dat
-rw-rw-r-- 1 z z   0 Aug  8 14:15 000011F5.dat
-rw-rw-r-- 1 z z   0 Aug  8 14:15 000011F6.dat
-rw-rw-r-- 1 z z   0 Aug  8 14:15 000011F7.dat
-rw-rw-r-- 1 z z   0 Aug  8 14:15 000011F8.dat
-rw-rw-r-- 1 z z   0 Aug  8 14:15 000011F9.dat
-rw-rw-r-- 1 z z 222 Aug  8 13:47 files
$ sh <(sed -r 's/^\s*(.*)\s+([0-9\.]+)\s+([0-9A-Z]{8}\.dat)\s*$/mv -iv \3 "\2 \1"/' files)
`000011F4.dat' -> `0.1 New File Name.xlsx'
`000011F5.dat' -> `0.2 New File Name.xlsx'
`000011F6.dat' -> `0.3 New File Name.xlsx'
`000011F7.dat' -> `0.4 New File Name.xlsx'
`000011F8.dat' -> `0.5 New File Name.xlsx'
`000011F9.dat' -> `0.6 New File Name.xlsx'
$ ls -l
total 60
-rw-rw-r-- 1 z z   0 Aug  8 14:15 0.1 New File Name.xlsx
-rw-rw-r-- 1 z z   0 Aug  8 14:15 0.2 New File Name.xlsx
-rw-rw-r-- 1 z z   0 Aug  8 14:15 0.3 New File Name.xlsx
-rw-rw-r-- 1 z z   0 Aug  8 14:15 0.4 New File Name.xlsx
-rw-rw-r-- 1 z z   0 Aug  8 14:15 0.5 New File Name.xlsx
-rw-rw-r-- 1 z z   0 Aug  8 14:15 0.6 New File Name.xlsx
-rw-rw-r-- 1 z z 222 Aug  8 13:47 files
  • Thanks alot. You've saved my bacon, I also appreciate the accompanying explanation, it's nice to see what's going on in a command before you run it. To be totally honest, I'm still not 100% certain what's happening. Is that using the sort of syntax you'd see in a regular expression? More specifically, I don't quite understand how that has defined the variables /1 /2 and /3 Thanks as well to the other responses, I appreciate the help! – user2472419 Aug 08 '13 at 22:19
  • Actually, this worked when I tested it at home last night but hasn't worked today in production. Apparently there's a problem with /dev/fd/63. I browsed to it and it doesn't exist, any ideas anybody? Google hasn't been much help on this one. Cheers. – user2472419 Aug 09 '13 at 15:04
  • What do you get if you don't do the process substitution? That is to say, just run sed -r 's/^\s*(.*)\s+([0-9\.]+)\s+([0-9A-Z]{8}\.dat)\s*$/mv -iv \3 "\2 \1"/' files without the sh <(...) part. –  Aug 09 '13 at 15:07
  • Sorry for the delay in reply and thanks again for your help with this. If I run the command as above, I get a syntax error: bash: syntax error near unexpected token `)' – user2472419 Aug 12 '13 at 08:12
  • In fact, it did execute, I forgot to remove the last ). However, nothing appears to have been renamed, I get a large list of mv commands after it has executed but all the files remain unchanged – user2472419 Aug 12 '13 at 08:27
  • As a further addition, the script did run eventually. I think it was a permission error. After resolving this the script did rename some of the files, however many it just renamed to the version number, rather than putting it at the beginning. Some others were simply left as 000001f5.dat but with a ? on the end. – user2472419 Aug 12 '13 at 13:01
  • That suggests a problem with your input data; sed isn't going to just randomly ignore some filenames and not others. –  Aug 12 '13 at 13:23
4
sed 's/^\(.*\.xlsx\) \+\([[:digit:]]\+\.[[:digit:]]\+\) \+\(.[^ ]*\)/"\3" "\2\1"/' \
  <file_list | xargs -n 2 mv

This divides the line into the part before .xlsx, which is the second part of the new name, which becomes accessible as \1. The it grabs the version and assigns it to \2. Then comes the old file name, ignoring a trailing space.

This is quoted an provided to mv as an argument. The -n 2 ensures that mv receives two arguments, the old and the new file name.

The spaces do not pose any problem, what complicates matters is that your input list is not well structured. If the columns would be swapped and the file names quoted, you could just use xargs and mv, without prior manipulation.

Marco
  • 33,548
1

The spaces in the file name, and the use of multiple spaces between some columns, make this harder, but by no means insurmountable.

Read the list file line by line. Usually one would use while IFS= read -r; do …, but here it might be more robust to strip leading and trailing whitespace. For each line:

  • Break each line into three parts. One way to do that is with regex matching. [[:space:]]+ matches one or more whitespace character (space or tab); [[:space:]]+ matches one or more non-whitespace characters. Parenthesized groups can be retrieved via the BASH_REMATCH variable.
    Another way, less convenient here, would be with ${VAR##PATTERN} and ${VAR%PATTERN} to strip off a prefix or suffix from a variable respectively.
  • Finally perform the move. Don't forget to log any errors.

Putting it all together:

ret=0
while read line; do
  if [[ $line =~ (.*[^[:space:]])[[:space:]]+([^[:space:]]+)[[:space:]]+([^[:space:]]+) ]]; then
    new_name="${BASH_REMATCH[1]}"
    version="${BASH_REMATCH[2]}"
    old_name="${BASH_REMATCH[3]}"
    mv -- "$old_name" "$version$new_name" || ret=1
  else
    echo "Malformed line: $line"
  fi
done <name_list.txt
exit $ret
0

An awk solution is to run this command:

awk '{print "/bin/mv", $NF, "\"" $(NF-1), gensub(/^([^.]+\.xlsx).*/, "\\1", 1) "\"" | "bash" } ; END { close("bash") }' sourcefile

The preceding command passes to the bash shell the output of the command:

awk '{print "/bin/mv", $NF, "\"" $(NF-1), gensub(/^([^.]+\.xlsx).*/, "\\1", 1) "\""}' sourcefile

which should be run first to make sure it is really what you want to execute!  This awk command prints, for each line in the source file, the /bin/mv command, followed by the last whitespace-delimited field in the line, followed by a double quotation mark, followed by the second-to-last field in the line, followed by the result of replacing the entire line by everything through the string .xlsx, followed by a double quotation mark.

Here is a variant you might prefer:

awk '{print "/bin/mv", $NF, "\"" "0." FNR, gensub(/^([^.]+\.xlsx).*/, "\\1", 1) "\"" | "bash" } ; END { close("bash") }' sourcefile

The variable FNR is the line number (so you can omit from your source file the entries 0.1, 0.2, 0.3, ...).

The whitespace in file names is not what I would call a “massive problem,” but I would recommend against it.  You could use something like this final version, which changes the spaces to underscores in your new file names:

awk '{print "/bin/mv", $NF, "0." FNR "_" gensub(" ","_", "g", gensub(/^([^.]+\.xlsx).*/, "\\1", 1)) | "bash" } ; END { close("bash") }' sourcefile
Greg Marks
  • 1,811