1

A sorted file (with LANG = fr_FR.UTF-8) contains :

Bassano del Grappa - Remondini, Giuseppe, II (1745-1811)
Bassano del Grappa - Remondini, Giuseppe, I (1672-1742)
...
Zurich - Wolf, Johannes (1564-1627)
Zurich - Wolf, Johann Rudolf, I (15..-1624)

No problem with accentuation. But, why II before I or Johannes before Johann ?

(that's on Red Hat Enterprise Linux Server release 6.6 (Santiago)).

mip22
  • 11
  • Can you add sample lines showing diacritical examples leading to the "null sort" issue your are referring to in a comment when using LC_ALL=C sort? – jlliagre Mar 09 '18 at 13:55
  • That would seem to indicate I sorts before 1 in that locale. What's the output of printf 'I\n1\n' | sort? – Stéphane Chazelas Mar 09 '18 at 14:25
  • With any locale :printf 'I\n1\n' | sort 1 I The sort "null" : Évora - Burgos, Andrés de (15..-1579) Évreux - Ancelle, Jean-Jacques-Louis (17..-18.. ; imprimeur-libraire) Évreux - Ancelle, Jean-Jacques (1787-18..) are at the end of the file – mip22 Mar 12 '18 at 09:11

2 Answers2

0

When using a non POSIX locale GNU sort doesn't lead to the expected output under Linux. This doesn't happen under Solaris even with GNU sort. See Stéphane Chazelas an indepth explanation here.

Your best bet is to switch to the POSIX locale with which at least you'll get a consistent output:

E.g.:

$ cat f
w
e
é
f
 z
  x

Linux:

$ LC_ALL=C sort f
  x
 z
e
f
w
é
$ LC_ALL=fr_FR.utf8 sort s
e
é
f
w
  x
 z

Solaris:

LC_ALL=C sort s 
  x
 z
e
f
w
é
$ LC_ALL=fr_FR.UTF-8 sort s
  x
 z
e
é
f
w
jlliagre
  • 61,204
  • But, this cannot explain the observed behavior... The ASCII code of ( is 40 which is lower than A (65). So, the II should appear after the I(, which is not the case. And, also, ne is higher than nR. So, none of the sample do match with what you say. Or, am I missing something ? – perror Mar 09 '18 at 11:56
  • Oddly this substitution makes it possible to get rid of the problem with I but not with the 'n' ??? – mip22 Mar 09 '18 at 12:57
  • cat toto
    Bassano del Grappa - Remondini, Giuseppe, II (1745-1811)
    Bassano del Grappa - Remondini, Giuseppe, I (1672-1742)
    Zurich - Wolf, Johannes (1564-1627)
    Zurich - Wolf, Johann Rudolf, I(15..-1624)
    cat toto |tr ' ' '\001' | sort | tr '\001' ' '
    Bassano del Grappa - Remondini, Giuseppe, I (1672-1742)
    Bassano del Grappa - Remondini, Giuseppe, II (1745-1811)
    Zurich - Wolf, Johannes (1564-1627)
    Zurich - Wolf, Johann Rudolf, I(15..-1624)
    – mip22 Mar 09 '18 at 13:00
  • Sorry i don't know how to make CRcat toto
    Bassano del Grappa - Remondini, Giuseppe, II (1745-1811)
    Bassano del Grappa - Remondini, Giuseppe, I (1672-1742)
    Zurich - Wolf, Johannes (1564-1627)
    Zurich - Wolf, Johann Rudolf, I(15..-1624)
    cat toto |tr ' ' '\001' | sort | tr '\001' ' '
    Bassano del Grappa - Remondini, Giuseppe, I (1672-1742)
    Bassano del Grappa - Remondini, Giuseppe, II (1745-1811)
    Zurich - Wolf, Johannes (1564-1627)
    Zurich - Wolf, Johann Rudolf, I(15..-1624)
    – mip22 Mar 09 '18 at 13:07
  • 1
    mip22, there is no way to insert newlines in comments. This kind of formatted information should be appended to the question. – jlliagre Mar 09 '18 at 16:24
0

I solved the problem by replacing all spaces by the pattern 000 before sorting (it's probably a space problem !! ??). Thank you all, specially Stéphane for the link to "Generate the collating order off a string".

mip22
  • 11