2

I am currently having problems with util-linux's look in combination with German umlauts (ä, ö, ü). For testing purposes I set LC_ALL=de_DE.UTF-8.

Consider german.dic:

Aachen
Rindfleisch

in UTF-8 encoding:

 $ file german.dic
german.dic: UTF-8 Unicode text

If I try to find the second word with /usr/bin/look, it works perfectly fine:

 $ look Rindf german.dic
Rindfleisch

Even if I add a word with a German umlaut (ä) inside the word look still works as expected:

 $ cat german.dic
Altäster
Rindfleisch
 $ look Rindf german.dic
Rindfleisch

However, if there is a word with a umlaut at the beginning:

 $ cat german.dic
Ältester
Rindfleisch
 $ look Rindf german.dic

It does not matter whether it's an uppercase or lowercase umlaut.

I've tried setting LC_ALL=de_DE.UTF-8 (which is definitely installed on my system) too, which did not work.

Stephen Kitt
  • 434,908
  • 2
    The util-linux version of look (as shipped in Fedora) exhibits this problem, whereas the bsdmainutils version (as shipped in Debian) works as expected. It looks like a bug in the former... – Stephen Kitt May 06 '20 at 16:15
  • bsdmainutils doesn't include look for me (Arch Linux). Is it named differently? – writzlpfrimpft May 06 '20 at 16:58
  • 1
    Distros ships one or the other, but not both; if Arch ships the util-linux look, it’s likely that its bsdmainutils package (or equivalent) doesn’t build it. (I haven’t checked.) – Stephen Kitt May 06 '20 at 17:30

1 Answers1

3

Stephen Kitt suggested in a comment, that this bug does not appear in the bsdmainutils.

I am running

Linux archlinux 5.6.10-arch1-1 #1 SMP PREEMPT Sat, 02 May 2020 19:11:54 +0000 x86_64 GNU/Linux

with util-linux from June 2011.

In the manpage for the bsd version of look the following is mentioned:

Input files must be sorted with LC_COLLATE set to ‘C’.

There is no line about this in the util-linux manpage of look.

When sorting with C locale, the umlauts are escaped:

 $ LC_COLLATE=C sort german.dic
Rindfleisch
Ältester
ÖBB

(notice how I added a line starting with Ö for testing purposes).

If I try to look in this file, it works as intended:

 $ LC_COLLATE=C sort german.dic -o german.dic
 $ cat german.dic
Rindfleisch
Ältester
 $ look Rindf german.dic
Rindfleisch
 $ look Ält german.dic
Ältester
 $ look Ö german.dic
ÖBB

Thanks for the help!

Stephen Kitt
  • 434,908