Locale settings are user preferences that relate to your culture.
Locale names
On all current unix variants that I know of (but not on a few antiques), locale names follow the same pattern:
- An ISO 639-1 lowercase two-letter language code, or an ISO 639-2 three-letter language code if the language has no two-letter code. For example,
en
for English, de
for German, ja
for Japanese, uk
for Ukrainian, ber
for Berber, …
- For many but not all languages, an underscore
_
followed by an ISO 3166 uppercase two-letter country code. Thus: en_US
for US English, en_UK
for British English, fr_CA
Canadian (Québec) French, de_DE
for German of Germany, de_AT
for German of Austria, ja_JP
for Japanese (of Japan), etc.
- Optionally, a dot
.
followed by the name of a character encoding such as UTF-8
, ISO-8859-1
, KOI8-U
, GB2312
, Big5
, etc. With GNU libc at least (I don't know how widespread this is), case and punctuation is ignored in encoding names. For example, zh_CN.UTF-8
is Mandarin (simplified) Chinese encoded in UTF-8, while zh_CN
is Mandarin Chinese encoded in GB2312, and zh_TW
is Taiwanese (traditional) Chinese encoded in Big5.
- Optionally, an at sign
@
followed by the name of a variant. The meaning of variants is locale-dependent. For example, many European countries have an @euro
locale variant where the currency sign is € and where the encoding is one that includes this character (ISO 8859-15 or ISO 8859-16), as opposed to the unadorned variant with the older currency sign. For example, en_IE
(English, Ireland) uses the latin1 (ISO 8859-1) encoding and £ as the currency symbol while en_IE@euro
uses the latin9 (ISO 8859-15) encoding and € as the currency symbol.
In addition, there are two locale names that exist on all unix-like system: C
and POSIX
. These names are synonymous and mean computerese, i.e. default settings that are appropriate for data that is parsed by a computer program.
Locale settings
The following locale categories are defined by POSIX:
LC_CTYPE
: the character set used by terminal applications: classification data (which characters are letters, punctuation, spaces, invalid, etc.) and case conversion. Text utilities typically heed LC_CTYPE
to determine character boundaries.
LC_COLLATE
: collation (i.e. sorting) order. This setting is of very limited use for several reasons:
- Most languages have intricate rules that depend on what is being sorted (e.g. dictionary words and proper names might not use the same order) and cannot be expressed by
LC_COLLATE
.
- There are few applications where proper sort order matters which are performed by software that uses locale settings. For example, word processors store the language and encoding of a file in the file itself (otherwise the file wouldn't be processed correctly on a system with different locale settings) and don't care about the locale settings specified by the environment.
LC_COLLATE
can have nasty side effects, in particular because it causes the sort order A < a < B < …, which makes “between A and Z” include the lowercase letters a through y. In particular, very common regular expressions like [A-Z]
break some applications.
LC_MESSAGES
: the language of informational and error messages.
LC_NUMERIC
: number formatting: decimal and thousands separator.
Many applications hard-code .
as a decimal separator. This makes LC_NUMERIC
not very useful and potentially dangerous:
- Even if you set it, you'll still see the default format pretty often.
- You're likely to get into a situation where one application produces locale-dependent output and another application expects
.
to be the decimal point, or ,
to be a field separator.
LC_MONETARY
: like LC_NUMERIC
, but for amounts of local currency.
Very few applications use this.
LC_TIME
: date and time formatting: weekday and month names, 12 or 24-hour clock, order of date parts, punctuation, etc.
GNU libc, which you'll find on non-embedded Linux, defines additional locale categories:
LC_PAPER
: the default paper size (defined by height and width).
LC_NAME
, LC_ADDRESS
, LC_TELEPHONE
, LC_MEASUREMENT
, LC_IDENTIFICATION
: I don't know of any application that uses these.
Environment variables
Applications that use locale settings determine them from environment variables.
- Then the value of the
LANG
environment variable is used unless overridden by another setting. If LANG
is not set, the default locale is C
.
- The
LC_xxx
names can be used as environment variables.
- If
LC_ALL
is set, then all other values are ignored; this is primarily useful to set LC_ALL=C
run applications that need to produce the same output regardless of where they are run.
- In addition, GNU libc uses
LANGUAGE
to define fallbacks for LC_MESSAGES
(e.g. LANGUAGE=fr_BE:fr_FR:en
to prefer Belgian French, or if unavailable France French, or if unavailable English).
Installing locales
Locale data can be large, so some distributions don't ship them in a usable form and instead require an additional installation step.
- On Debian, to install locales, run
dpkg-reconfigure locales
and select from the list in the dialog box, or edit /etc/locale.gen
and then run locale-gen
.
- On Ubuntu, to install locales, run
locale-gen
with the names of the locales as arguments.
You can define your own locale.
Recommendation
The useful settings are:
- Set
LC_CTYPE
to the language and encoding that you encode your text files in. Ensure that your terminals use that encoding.
For most languages, only the encoding matters. There are a few exceptions; for example, an uppercase i
is I
in most languages but İ
in Turkish (tr_TR
).
- Set
LC_MESSAGES
to the language that you want to see messages in.
- Set
LC_PAPER
to en_US
if you want US Letter to be the default paper size and just about anything else (e.g. en_GB
) if you want A4.
- Optionally, set
LC_TIME
to your favorite time format.
As explained above, avoid setting LC_COLLATE
and LC_NUMERIC
. If you use LANG
, explicitly override these two categories by setting them to C
.
LC_PAPER
. And can I update this across the system without rebooting? – Faheem Mitha Aug 10 '14 at 19:31/etc/default/locale
. These files take effect when you log in; you can doexport LC_PAPER=…
in a shell to affect commands launched from that shell. – Gilles 'SO- stop being evil' Aug 10 '14 at 19:59