3

I am on Xigmanas (NAS freebsd). I'll explain the situation as simply as possible:

:;  set | egrep 'LC_A|LANG'
GDM_LANG=fr_FR.UTF-8
LANG=fr_FR.UTF-8
LC_ALL=fr_FR.UTF-8
SLIM_LANG=fr_FR.UTF-8

:; ls -i 1989* ; ls -i | grep 1989 ; ls -ib 1989* ; ls -ib | grep 1989 9920 1989 Amn??sia.mp4 9920 1989 Amnésia.mp4 9920 1989 Amn\303\251sia.mp4 9920 1989 Amn\303\251sia.mp4

We see that with ls only the accent is transformed into ?? And this is not the case with the pipe followed by grep.

I don't see an explanation since the pipe should not modify the byte stream especially not by correcting the flow!

ls followed by grep displays correctly while ls alone is incorrect.

What's going on?

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • 4
    What's the output of locale charmap? It's possible your system doesn't have a fr_FR.UTF-8 locale installed or as Chris said that you didn't export that LC_ALL variable to the environmnent. Can you see that locale in the output of locale -a? – Stéphane Chazelas Mar 09 '24 at 20:08
  • 1
    Other instances where ls -i 1989* and ls -i | grep 1989 would generate different outputs are when there is any file with 1989 in its name but not at the start of the name, or when a file has an inode number that includes the 1989 substring, or (on some systems) if any file whose name starts with 1989 also has one or several embedded newline characters in its name. – Kusalananda Mar 10 '24 at 12:47
  • This is tagged FreeBSD so I will strongly suggest to use locale command to verify and see this on how to use login classes rather than setting LC_* directly. Have you tested with the default C.UTF8? – Claus Andersen Mar 14 '24 at 10:59

1 Answers1

1

This works:

ls --show-control-chars A*
Amnésia

As does this,

ls A* | cat
Amnésia

The documentation for the version of ls I have on Debian (ls (GNU coreutils) 8.32) writes,

--show-control-chars show nongraphic characters as-is (the default, unless [...] output is a terminal)

Arguably it's a bug, because the two bytes that represent é in a UTF-8 locale (\303\251) should be treated as a printing character, not a non-graphic character.


And now I cannot reproduce it. Did you export your locale variables? I should get a set of results for this command:

env | egrep 'LC_A|LANG'

If not, try this to export the locale variables to the environment:

eval $(LC_ALL=fr_FR.UTF-8 locale | sed 's/^/export /' )

Then retry:

ls A*
Amnésia
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • What do you suppose it means for the ls man page to say, “unless program is 'ls'”? – G-Man Says 'Reinstate Monica' Mar 09 '24 at 22:34
  • @G-ManSays'ReinstateMonica' I don't know. I wonder if it implies that ll and any other variations on the theme are implemented by ls itself – Chris Davies Mar 09 '24 at 23:01
  • 1
    @G-ManSays'ReinstateMonica' and Chris I thought that option might show up in documentation for other GNU coreutils tools too and I found --show-control-chars mentioned as also being an option for pr (possibly not relevant to this). Under Directory Listing, though, the command dir is discussed as equivalent to ls -C -b, and vdir as equivalent to ls -l -b, both heavily referencing the mentioned ls documentation which I expect is relevant. – Ed Morton Mar 11 '24 at 05:18
  • 1
    @EdMorton ah! Thankyou – Chris Davies Mar 11 '24 at 07:38
  • 1
    I've removed the confusing subclause from the quote, EdMorton, G-ManSays'ReinstateMonica' – Chris Davies Mar 11 '24 at 07:40
  • can't reproduce either, I always get accent right. – Archemar Mar 11 '24 at 08:25
  • Hello,I am on Xigmanas (NAS freebsd)

    :; set | egrep 'PS1|LC_ALL|LANG' LANG=en_US.UTF-8 PS1=':; ' :; éùà -bash: éùà: command not found

    :; touch "1989 Amnésia.mp4" :; ls -a 1989* 1989 Amnésia.mp4

    :; LANG=fr_FR.UTF-8 :; set | egrep 'LANG' LANG=fr_FR.UTF-8

    :; ls -a 1989* 1989 Amn??sia.mp4 :;

    :; LANG=en_US.UTF-8 :; ls -a 1989* 1989 Amnésia.mp4

    – Dhénin Jean-Jacques Mar 11 '24 at 08:54
  • are you on a mounted ressources ? (e.g. either samba or NFS), if so, can you edit your post with mount options ? (/etc/fstab or command line or revelent line from mount) – Archemar Mar 11 '24 at 15:19
  • Be aware that --show-control-chars is a GNU ls option. You should probably be using -B - see ls(1). Or install GNU ls. For FreeBSD it is furthermore recommended to use login classes rather than setting LC_* directly. – Claus Andersen Mar 14 '24 at 10:57
  • 1
    @ClausAndersen my original answer (this one here) responded to the question when it didn't have the [tag:freebsd] tag. I added that in response to a comment made after my most recent answer edit. You'd be welcome to write your own answer that makes use of the newest information – Chris Davies Mar 14 '24 at 11:07
  • Nah - you got the most important bit "Arguably it's a bug,". Most likely locale is not set correctly. Xigmanas is unknown to me. – Claus Andersen Mar 14 '24 at 11:19