4

Updated: This is not a file system problem.

I used to be able to enter:

$ echo kødpålæg

But now bash/zsh change this to:

bash$ echo kddddddddplg
zsh$ echo k<c3><b8>dp<c3><a5>l<c3><a6>g

I can run cat and enter 'kødpålæg' with no problem:

$ cat
kødpålæg
kødpålæg

This is both with this environment:

$ locale   
LANG=C
LANGUAGE=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C

and in this:

$ locale 
LANG=da_DK.utf8
LANGUAGE=da_DK.utf8
LC_CTYPE="da_DK.utf8"
LC_NUMERIC="da_DK.utf8"
LC_TIME="da_DK.utf8"
LC_COLLATE="da_DK.utf8"
LC_MONETARY="da_DK.utf8"
LC_MESSAGES="da_DK.utf8"
LC_PAPER="da_DK.utf8"
LC_NAME="da_DK.utf8"
LC_ADDRESS="da_DK.utf8"
LC_TELEPHONE="da_DK.utf8"
LC_MEASUREMENT="da_DK.utf8"
LC_IDENTIFICATION="da_DK.utf8"
LC_ALL=da_DK.utf8

csh does not change 'kødpålæg'.

How can I get the old behaviour back, so I can enter 'kødpålæg'?

Running any of these give the old behaviour:

LC_ALL=en_GB.utf-8 luit
LC_ALL=da_DK.utf-8 luit
LC_ALL=en_GB.iso88591 luit
LC_ALL=da_DK.iso88591 luit

but only for that single session.

This:

$ od -An -vtx1
ø

Gives:

 c3 b8 0a

So it seems the input from Konsole to bash is UTF8.

$ konsole --version
QCoreApplication::arguments: Please instantiate the QApplication object first
Qt: 5.5.1
KDE Frameworks: 5.18.0
Konsole: 15.12.3

$ bash --version
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ zsh --version
zsh 5.1.1 (x86_64-ubuntu-linux-gnu)

$ dpkg -l csh
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name              Version       Architecture  Description
+++-=================-=============-=============-========================================
ii  csh               20110502-2.1u amd64         Shell with C-like syntax
Ole Tange
  • 35,514

2 Answers2

5

I'd say most likely your terminal is misconfigured and sends and displays characters in some single-byte character set, probably ISO8859-1 or ISO8859-15 given the sample characters you show instead of the locale's charset.

There is typically no ø, å, æ character in the C locale and the ISO8859-1(5) encoding of those characters (0xf8, 0xe5, 0xe6) don't form valid characters in UTF-8. Line editors like readline or zle need to decode those into characters as they need to know how many bytes make up a display column so they can do cursor positioning correctly.

Moreover, in the C locale which on most systems uses ASCII, since there are no character in ASCII with the 8th bit set, that 8th bit would be understood by bash as meaning Meta. 0xF8 would be understood as meaning Meta+x (0x78 (x) | 0x80), because that's what some terminals send upon Alt+x or Meta+x.

While M-x is not bound to anything by default in bash, ß would be understood as M-_ and insert the last word. You can turn that off with:

bind 'set convert-meta off'

Shells like csh are too ancient to even be aware that characters may be made of several bytes or take up anything but a single column width, so they don't bother.

To verify that theory, run:

od -An -vtx1

And enter those characters followed by ^D^D and see what encoding you see. If you see 0xf8 for ø, that means I'm right. If you see 0xc3 0xb8 instead, which is the UTF-8 encoding of ø that means I'm wrong.

Or change the locale to da_DK.iso88591 (check in locale -a for the exact name of the locale on your system) and see if that works better.

Now as to why your terminal may send the wrong encoding for those characters, maybe it was started in a locale where the charset was iso8859-1. Maybe it's configured to ignore the locale and use a specific charset (look for charset or encoding in its configuration). Or maybe you've sshed in from another system where the locale was using ISO8859-1(5) as its charset.

I can reproduce that behaviour if from a UTF-8 terminal, I run:

LC_ALL=en_GB.iso885915 luit

And from within luit change the locale to C or a UTF-8 one and enter non-ASCII characters.

0

Your cat test indicates the terminal connection is 8-bit clean. So it seems a possible locale issue.

Please run locale -a to verify that your chosen locale "da_DK.utf8" exists; if it is not listed, and you are on a system that belongs to the Debian/Ubuntu family, you might have to uncomment it in /etc/locale.gen and then run locale-gen as root.

Also, some shell versions cannot switch locales dynamically, but keep using the locale setting that was originally inherited from their parent process. If this is the case, then running LC_CTYPE=da_DK.UTF-8 bash would restore the desired behavior, for the time of that particular session only. If that's true, then changing the system default locale to any supported UTF-8 locale and then rebooting might help: it would change the locale of the processes responsible for handling your login and starting your shell.

telcoM
  • 96,466