All of:
tr '[:lower:]' '[:upper:]'
(don't forget the quotes, otherwise that won't work if there's a file called :
, l
, ... or r
in the current directory) or:
awk '{print toupper($0)}'
or:
dd conv=ucase
are meant to convert characters to uppercase according to the rules defined in the current locale. However, even where locales use UTF-8 as the character set and clearly define the conversion from lowercase to uppercase, at least GNU dd
, GNU tr
and mawk
(the default awk
on Ubuntu for instance) don't follow them. Also, there's no standard way to specify locales other than C
or POSIX
, so if you want to convert UTF-8 files to uppercase portably regardless of the current locale, you're out of luck with the standard toolchest.
As often, for portability, your best bet may be perl:
$ echo lľsšcčtťzž | PERLIO=:utf8 perl -pe '$_=uc'
LĽSŠCČTŤZŽ
Now, you need to beware that not everybody agrees on what the uppercase version of a specific character is.
For instance, in Turkish locales, the uppercase i
is not I
, but İ
(<U0130>
). Here with the heirloom toolchest tr
instead of GNU tr:
$ echo ií | LC_ALL=C.UTF-8 tr '[:lower:]' '[:upper:]'
IÍ
$ echo ií | LC_ALL=tr_TR.UTF-8 tr '[:lower:]' '[:upper:]'
İÍ
On my system, the perl
to-upper conversion is defined in /usr/share/perl/5.14/unicore/To/Upper.pl
, and I find that it behaves differently on a few characters from the GNU libc toupper()
in the C.UTF8
locale for instance, perl
being more accurate. For instance perl
correctly converts ɀ to Ɀ, the GNU libc (2.17) doesn't.