7

I would like to locally attach an NTFS volume to my unix (Ubuntu) machine, and copy (replicate) some unix directories to it, using rsync, in a way that the result is readable under Windows.

I do not care about ownership and permissions. It would be nice if modification dates would be preserved. I only need directories and files (symbolic links would be nice, too; but not a problem if they cannot be copied).

Two obvious problems are: case (in)sensitivity, and characters that are illegal in Windows filenames. For example, in Linux I can have two files "a" and "A"; I can copy them to the NTFS volume, but in Windows I will be able to access (at most?) one of them. But I am happy to ignore that problem. What I am interested about are illegal characters in Windows filenames, which are <,>,:,",/,\,|,?, and * (well, actually also ascii 0-31, but I do not care about that. There might also be problems with files ending in a "."?).

I would like rsync to automatically "rename", e.g., a file called "a:" to, say a(COLON), to end up with a legal name (and, ideally, translate a(COLON) back to a:)

Is this possible to have rsync automatically rename files to avoid characters forbidden in Windows?

  • As far as I understand rsync can use iconv to do such tasks; is there a standard iconv module for windows-filenames? (I briefly looked into programming an own gconv module, but lacking C knowlege this seems too complicated).
  • I have been told that rdiff-backup can do some conversions like that, but the homepage just mentions something being done "automatically", and I am not sure whether a locally mounted NTFS vomlume would trigger a renaming in a reliable way?
  • I am aware that there is fuse-posixovl, but this seems an overkill for my purpose, and also it doesn't seem to be well documented (which characters will be translated in which way? Will all filenames be truncated to 8.3 or whatever? Can I avoid the additional files carrying owner/permission information, which I will not need, etc etc.)
  • I am aware that I could avoid all these problems by using, e.g., a tar file; but this is not what I want. (In particular, I would like in Windows to further replicate from the NTFS volume to another backup partition, copying only the changed files)
  • I am aware of the "windows_names" option when mounting NTFS; but this will prevent creating offending files, not rename them.

Update: As it seems my question was not quite clear, let me give a more explicit example: For example, WINDOWS-1251 is of no use for me. iconv -f utf-8 -t WINDOWS-1251//TRANSLIT transforms

123 abc ABC äö &:<!|

into

123 abc ABC ao &:<!|

I would need a codepage, windows-filenams, say (which does not exist), that transforms the string into something like

123 abc ABC äö &(COLON)(LT)!(PIPE)

Update 2: I now gave up and renamed the offending files ``by hand'' (i.e., by script). From now on, every time before running rsync, I run a script that checks whether offending filenames exist (but does not automatically deal rename anything); I just use

# find stuff containing forbidden chars
find $MYDIR -regex '.*/[^/]*[<>:*"\\|?][^/]*'
# find stuff containing dot as last character (supposedly bad for windows)
find $MYDIR -regex '.*\.'
# find stuff that is identical case insensitive
find $MYDIR -print0 | sort -z | uniq -diz | tr '\0' '\n'

(the last line is from case-insensitive search of duplicate file-names )

Jakob
  • 171
  • Thank you for the convmvfs suggestion.

    But it seems that it uses iconv as well; i.e., it does "universally" (for the whole mount) what rsync can do for the specific rsync operation. Unfortunately I am not aware of an iconv charset that consists of unicode minus the forbidden windows characters (otherwise I could just use this iconv charset in rsync directly, I assume)

    – Jakob Feb 08 '17 at 10:01
  • --iconv=utf8,windows-charset but I have no idea what windows use. iconv --list show what is available on your system. –  Feb 08 '17 at 13:05
  • @Bahamut Thanks, this is actually part of my question: Is there an iconv module that corresponds to valid windows filename characters?

    I am not aware of any traditional codepage that includes unicode characters but not, e.g., ":";

    I looked at the available iconv charsets (in particular the ones starting with "windows"), but they all seemed to include ":". On my system (ubuntu) there is no "windows-charset" codepage, btw.

    – Jakob Feb 08 '17 at 14:21
  • Maybe better one example rsync -avg --iconv=utf8,WINDOWS-1251 but I don't know which one you need exactly. This one is for kyrillisch. Maybe helpful https://msdn.microsoft.com/de-de/library/windows/desktop/dd317756%28v=vs.85%29.aspx –  Feb 08 '17 at 15:13
  • Thanks again, but this is of no use for me. I updated my question to make it (hopefully) clearer. – Jakob Feb 08 '17 at 15:31

1 Answers1

0

A pragmatic solution would be to reproduce the source directories with the desired converted filenames locally, using hard links to the original files, then rsync this copy as-is to the ntfs filesystem.

For example, this perl script demo duplicates the hierarchy /tmp/a/ into /tmp/b/ and url-encodes (with % and 2 hex digits) the undesirable characters so file:b becomes file%3ab (a hard link) and directory %b<ha> becomes directory %25b%3cha%3e and so on:

#!/usr/bin/perl
use strict;
use File::Find;
my $startdir = '/tmp/a';
my $copydir = '/tmp/b';
sub handlefile{
    my $name = substr($File::Find::name,1);
    my $oldname = $startdir.$name;
    $name =~ s/([;, \t+%&<>:\"\\|?*])/sprintf('%%%02x',ord($1))/ge;
    $name = $copydir.$name;
    printf "from %s to %s\n",$oldname,$name;
    if(!-l and -d){ mkdir($name) or die $!; }
    else{ link($oldname,$name) or die $!; }
}
chdir($startdir) or die;
find(\&handlefile, '.');

You can then rsync /tmp/b to your ntfs. This is just a demo, and needs work for unicode and other limitations of ntfs like max filename length. You could also check for lowercase/uppercase clashes , and use your preferred encoding (: to COLON and so on). You could do a second pass to fix the timestamps on the directories. Unless you have millions of files, the work needed to create this copy of the directory structure, with hard links to the files, should not be that onerous.

meuh
  • 51,383
  • 1
    Thank you very much for the suggestion. However, the solution doesn't seem quite optimal: It seems that for the whole thing to run reliably, I have to remove the whole copydir tree every time I run the script, and then re-run it (or alternatively, check whether the file/dir still exists in the source, and if not remove it; i.e.; doing stuff that I hope rsync would do for me). As I am thinking of a large partition (~ 1TB, ~1Mio files) and I would like to rsync regulalry, this seems a bit inefficient.... Still, I will come back to this if I can't find anything more efficient... – Jakob Feb 08 '17 at 19:32