2

I have a big problem with some non-standard characters while trying to copy (for the sake of backup) some files in AIX 6.1. What I need to do is have the backup archive with exactly same file names in order to able to restore the files and make it all works again.

I try to create ksh script which would take name by name (line by line) from csv file and copy them using cp to backup destination. After running the script I realised I have some files missing in new location. After some investigation I saw that missing files have some non-standard characters in their names in original directory (like long –, letters with accents etc.).

  1. I have csv file with list of files to be copied, here example with long "–":
    cat file_list.csv | grep pattern /path/to/file/some–file_with_pattern

  2. when I try to copy this specific file with cp (both by script or by manually running cp):
    cp /path/to/file/some–file_with_pattern /new/backup/path cp: /path/to/file/some–file_with_pattern: No such file or directory

  3. when I display the file with ls:
    ls /path/to/file | grep pattern some?file_with_pattern

So the AIX sees the file with "?" instead of "–" hence cp throws an error. When I use the wildcard (* or ?) to copy the file, then it is copied but has "?" instead of "–" in destination directory. When I put file name in ' ' or " " it is the same, "?" instead of "–" in destination directory. This doesn't satisfy my need, as when I restore file with different name ("?" instead of "–"), the application which uses it won't be able to refer to it.

I tried both ksh and sh (bash is unavailable on the server). I tried to play with locale (forcing UTF8 by setting LC_ALL="en_US.UTF-8"). I tried to play with Putty encoding settings. Still I am not able to refer to the files by their original names, hence cannot copy them keeping original name. Does anybody have any idea how to copy those files using shell commands?

Luke
  • 21

1 Answers1

1

There are three distinct issues to address here:

A ? in ls

It is not true that ls will show the real characters in a filename.

For example, the following filenames contain characters that ls might replace (not always) with a ?:

$ eval "$(printf "a='\n' b=\032 c=–")"
$ touch test{"$a","$b","$c"}file
$ ls --quoting-style=literal test*
test?file  test?file  test–file

But this will not use ? (it is the shell listing the files):

$ echo test*
test
file testfile test–file

Of course, to show the characters (encoded) do:

$ echo test* | od -An -tc
   t   e   s   t  \n   f   i   l   e       t   e   s   t 032   f   i   l   e
   t   e   s   t 342 200 223   f   i   l   e  \n

So, you need to use something like od to really see the characters used in the file.

$ touch 'some–file_with_pattern'
$ echo *pattern* | od -An -tc
   s   o   m   e 342 200 223   f   i   l   e   _   w   i   t   h   _   p   a   t   t   e   r   n  \n

encoding

Note that above the long dash is always shown correctly as the visible character: (opposed to the ? in your examples) because the encoding is correctly set to utf-8.

So check your PuTTY settings under Translation and ensure that you have UTF-8 set as the character set.

And then try this line, it should print the same characters that you see in this web page.

$ echo 'áé€íìïîößđ₣λŕžç×  Москва  Ελληνικά'
áé€íìïîößđ₣λŕžç×  Москва  Ελληνικά

filesystem encoding

The two Issues above should solve the problem in 99% of cases except in those where the filesystem encoding does not match the locale encoding or is different between the two filesystems involved.

Do make sure that there is no problem with the above two issues before even looking into this issue. Read Same file, different filename due to encoding problem?