Locale settings are user preferences that relate to your culture.
Locale names
On all current unix variants that I know of (but not on a few antiques), locale names follow the same pattern:
- An ISO 639-1 lowercase two-letter language code, or an ISO 639-2 three-letter language code if the language has no two-letter code. For example,
en for English, de for German, ja for Japanese, uk for Ukrainian, ber for Berber, …
- For many but not all languages, an underscore
_ followed by an ISO 3166 uppercase two-letter country code. Thus: en_US for US English, en_UK for British English, fr_CA Canadian (Québec) French, de_DE for German of Germany, de_AT for German of Austria, ja_JP for Japanese (of Japan), etc.
- Optionally, a dot
. followed by the name of a character encoding such as UTF-8, ISO-8859-1, KOI8-U, GB2312, Big5, etc. With GNU libc at least (I don't know how widespread this is), case and punctuation is ignored in encoding names. For example, zh_CN.UTF-8 is Mandarin (simplified) Chinese encoded in UTF-8, while zh_CN is Mandarin Chinese encoded in GB2312, and zh_TW is Taiwanese (traditional) Chinese encoded in Big5.
- Optionally, an at sign
@ followed by the name of a variant. The meaning of variants is locale-dependent. For example, many European countries have an @euro locale variant where the currency sign is € and where the encoding is one that includes this character (ISO 8859-15 or ISO 8859-16), as opposed to the unadorned variant with the older currency sign. For example, en_IE (English, Ireland) uses the latin1 (ISO 8859-1) encoding and £ as the currency symbol while en_IE@euro uses the latin9 (ISO 8859-15) encoding and € as the currency symbol.
In addition, there are two locale names that exist on all unix-like system: C and POSIX. These names are synonymous and mean computerese, i.e. default settings that are appropriate for data that is parsed by a computer program.
Locale settings
The following locale categories are defined by POSIX:
LC_CTYPE: the character set used by terminal applications: classification data (which characters are letters, punctuation, spaces, invalid, etc.) and case conversion. Text utilities typically heed LC_CTYPE to determine character boundaries.
LC_COLLATE: collation (i.e. sorting) order. This setting is of very limited use for several reasons:
- Most languages have intricate rules that depend on what is being sorted (e.g. dictionary words and proper names might not use the same order) and cannot be expressed by
LC_COLLATE.
- There are few applications where proper sort order matters which are performed by software that uses locale settings. For example, word processors store the language and encoding of a file in the file itself (otherwise the file wouldn't be processed correctly on a system with different locale settings) and don't care about the locale settings specified by the environment.
LC_COLLATE can have nasty side effects, in particular because it causes the sort order A < a < B < …, which makes “between A and Z” include the lowercase letters a through y. In particular, very common regular expressions like [A-Z] break some applications.
LC_MESSAGES: the language of informational and error messages.
LC_NUMERIC: number formatting: decimal and thousands separator.
Many applications hard-code . as a decimal separator. This makes LC_NUMERIC not very useful and potentially dangerous:
- Even if you set it, you'll still see the default format pretty often.
- You're likely to get into a situation where one application produces locale-dependent output and another application expects
. to be the decimal point, or , to be a field separator.
LC_MONETARY: like LC_NUMERIC, but for amounts of local currency.
Very few applications use this.
LC_TIME: date and time formatting: weekday and month names, 12 or 24-hour clock, order of date parts, punctuation, etc.
GNU libc, which you'll find on non-embedded Linux, defines additional locale categories:
LC_PAPER: the default paper size (defined by height and width).
LC_NAME, LC_ADDRESS, LC_TELEPHONE, LC_MEASUREMENT, LC_IDENTIFICATION: I don't know of any application that uses these.
Environment variables
Applications that use locale settings determine them from environment variables.
- Then the value of the
LANG environment variable is used unless overridden by another setting. If LANG is not set, the default locale is C.
- The
LC_xxx names can be used as environment variables.
- If
LC_ALL is set, then all other values are ignored; this is primarily useful to set LC_ALL=C run applications that need to produce the same output regardless of where they are run.
- In addition, GNU libc uses
LANGUAGE to define fallbacks for LC_MESSAGES (e.g. LANGUAGE=fr_BE:fr_FR:en to prefer Belgian French, or if unavailable France French, or if unavailable English).
Installing locales
Locale data can be large, so some distributions don't ship them in a usable form and instead require an additional installation step.
- On Debian, to install locales, run
dpkg-reconfigure locales and select from the list in the dialog box, or edit /etc/locale.gen and then run locale-gen.
- On Ubuntu, to install locales, run
locale-gen with the names of the locales as arguments.
You can define your own locale.
Recommendation
The useful settings are:
- Set
LC_CTYPE to the language and encoding that you encode your text files in. Ensure that your terminals use that encoding.
For most languages, only the encoding matters. There are a few exceptions; for example, an uppercase i is I in most languages but İ in Turkish (tr_TR).
- Set
LC_MESSAGES to the language that you want to see messages in.
- Set
LC_PAPER to en_US if you want US Letter to be the default paper size and just about anything else (e.g. en_GB) if you want A4.
- Optionally, set
LC_TIME to your favorite time format.
As explained above, avoid setting LC_COLLATE and LC_NUMERIC. If you use LANG, explicitly override these two categories by setting them to C.
LC_PAPER. And can I update this across the system without rebooting? – Faheem Mitha Aug 10 '14 at 19:31/etc/default/locale. These files take effect when you log in; you can doexport LC_PAPER=…in a shell to affect commands launched from that shell. – Gilles 'SO- stop being evil' Aug 10 '14 at 19:59