shell rename file names with non-printable characters

Question

I'm trying to find a way to batch rename file names which originally contains Japanese characters, which are non-printable in my shell.There is clearly something I'm missing here in understanding how regex works, in this use case,

When I run ls I have this :

AIR?t?H?[????002.jpg
AIR?t?H?[????009.jpg
AIR?t?H?[????075.jpg

And ls -ldb * give me this:

AIR\342t\342H\374[\342\353\342\307002.jpg
AIR\342t\342H\374[\342\353\342\307009.jpg
AIR\342t\342H\374[\342\353\342\307075.jpg

Basically I want to match and replace everything between AIR and [0-9]*

I'm currently looking at something like that :

find AIR*.jpg -type f -exec sed -ri 's/(?<=AIR)(.*?)([0-9]*)/\2test/' {} +

But i get this error:

sed: -e expression #1, char 31: Invalid preceding regular expression

I have also tried using

echo AIR�t�H�\[����002.jpg | sed -r 's/AIR([^[:print:]\t\r])*/\1toto/g'

But it rename AIR instead of the "special character" group

toto�t�H�[��002.jpg

And

echo AIR�t�H�\[����002.jpg | sed -r 's/AIR([^[:print:]\t\r])*/\2toto/g'

returns

sed: -e expression #1, char 33: invalid reference \2 on `s' command's RHS

Also tr seems it could be an option but I don't have only special characters within my two groups AIR and [0-9]* so here is what I got:

echo AIR�t�H�\[����002.jpg | tr -c '[:print:]\t\r\n'test '[ *]'

returns:

AIR t H [ 002.jpg

The (?<=AIR) syntax is not supported in sed -r. See Why does my regular expression work in X but not in Y? — Gilles 'SO- stop being evil', Mar 22 '17 at 21:28

Stephen Kitt · Accepted Answer · 2017-03-22T08:14:20.627

sed substitution looks for instances (all instances since you’re using g) matching the first argument, and replaces the full match with the second argument. So if you include “AIR” in the first argument, it will be replaced — you need to include it in the second argument if you want to keep it. When sed complains of an invalid reference, it means you haven’t defined a corresponding group in the first argument (using $ and $, or ( and ) since you’ve specified -r).

Since you’re looking for “AIR” followed by any characters followed by digits, I would suggest the following:

sed -r 's/AIR([^[:digit:]]*)([[:digit:]]+).jpg/AIRtest\2.jpg/g'

This replaces “AIR” with “AIR”, any non-digits with “test”, and keeps all the digits thereafter. If you don’t need to process the characters between “AIR” and the digits, you can ignore them:

sed -r 's/AIR[^[:digit:]]*([[:digit:]]+).jpg/AIRtest\1.jpg/g'

If you have the Perl rename, you can transpose this to rename your files:

rename 's/AIR[^[:digit:]]*([[:digit:]]+).jpg/AIRtest\1.jpg/g' AIR*.jpg

or

rename 's/AIR[^[:digit:]]*([[:digit:]]+).jpg/AIRtest$1.jpg/g' AIR*.jpg

(rename prefers $ for group references).

The OP is using sed -r so the group matches are (..) rather than $..$ — Chris Davies, Mar 22 '17 at 08:04
Super thanks a lot ! it work using rename instead of sed which was opening all the files one by one instead of renaming the files title.. will have more sed documentation reading to avoid that ^^ — Matthieu Ducorps, Mar 22 '17 at 09:00

shell rename file names with non-printable characters

1 Answers1