1

I'm trying to find a way to batch rename file names which originally contains Japanese characters, which are non-printable in my shell.There is clearly something I'm missing here in understanding how regex works, in this use case,

When I run ls I have this :

AIR?t?H?[????002.jpg
AIR?t?H?[????009.jpg
AIR?t?H?[????075.jpg

And ls -ldb * give me this:

AIR\342t\342H\374[\342\353\342\307002.jpg
AIR\342t\342H\374[\342\353\342\307009.jpg
AIR\342t\342H\374[\342\353\342\307075.jpg

Basically I want to match and replace everything between AIR and [0-9]*

I'm currently looking at something like that :

find AIR*.jpg -type f -exec sed -ri 's/(?<=AIR)(.*?)([0-9]*)/\2test/' {} +

But i get this error:

sed: -e expression #1, char 31: Invalid preceding regular expression

I have also tried using

echo AIR�t�H�\[����002.jpg | sed -r 's/AIR([^[:print:]\t\r])*/\1toto/g'

But it rename AIR instead of the "special character" group

toto�t�H�[����002.jpg

And

echo AIR�t�H�\[����002.jpg | sed -r 's/AIR([^[:print:]\t\r])*/\2toto/g'

returns

sed: -e expression #1, char 33: invalid reference \2 on `s' command's RHS

Also tr seems it could be an option but I don't have only special characters within my two groups AIR and [0-9]* so here is what I got:

echo AIR�t�H�\[����002.jpg | tr -c '[:print:]\t\r\n'test '[ *]'

returns:

AIR t H [ 002.jpg

1 Answers1

4

sed substitution looks for instances (all instances since you’re using g) matching the first argument, and replaces the full match with the second argument. So if you include “AIR” in the first argument, it will be replaced — you need to include it in the second argument if you want to keep it. When sed complains of an invalid reference, it means you haven’t defined a corresponding group in the first argument (using \( and \), or ( and ) since you’ve specified -r).

Since you’re looking for “AIR” followed by any characters followed by digits, I would suggest the following:

sed -r 's/AIR([^[:digit:]]*)([[:digit:]]+).jpg/AIRtest\2.jpg/g'

This replaces “AIR” with “AIR”, any non-digits with “test”, and keeps all the digits thereafter. If you don’t need to process the characters between “AIR” and the digits, you can ignore them:

sed -r 's/AIR[^[:digit:]]*([[:digit:]]+).jpg/AIRtest\1.jpg/g'

If you have the Perl rename, you can transpose this to rename your files:

rename 's/AIR[^[:digit:]]*([[:digit:]]+).jpg/AIRtest\1.jpg/g' AIR*.jpg

or

rename 's/AIR[^[:digit:]]*([[:digit:]]+).jpg/AIRtest$1.jpg/g' AIR*.jpg

(rename prefers $ for group references).

Stephen Kitt
  • 434,908
  • The OP is using sed -r so the group matches are (..) rather than \(..\) – Chris Davies Mar 22 '17 at 08:04
  • Super thanks a lot ! it work using rename instead of sed which was opening all the files one by one instead of renaming the files title.. will have more sed documentation reading to avoid that ^^ – Matthieu Ducorps Mar 22 '17 at 09:00