3

I am trying to understand how locale works in Linux. This is how I think it works:

Each process have an environment variables table, you can launch a process and set some environment variables for this process with some locale variables (for example: LC_ALL=en_US.UTF-8).

But if this newly launched process wants to see what its locale variables are, it doesn't look in the environment variables table, but rather there is a separate locale variables table that the process looks at, something like this:

enter image description here

So if the process wants to use the locale variables that are set in its environment variables table, it should copy them into its locale variables table first. For example, to copy the LC_ALL variable from the environment variables table into the locale variables table, the process does the following:

setlocale (LC_ALL, "");

Am I correct in my understanding? And if I am correct, are all processes have a locale variables table, or is this table only present in programs written in C?

James
  • 201
  • 1
    A process may or may not call setlocale(). If it calls selocale(), it may call it with the second argument set to "" to get the values from the environment, or set to "C" to explicitly ignore them. It does not have to call setlocale() at all; if it doesn't, it behaves as if it called setlocale (LC_ALL, "C"). It may look at the environment variables directly. It may also ignore them. The function call setlocale() is specific to the C programming language; other language may or may not offer comparable mechanisms. – AlexP Jun 11 '18 at 17:50

1 Answers1

2

Your understanding is partly right, but only partly. One thing you're missing to understand this is how locale settings are used. Locale settings are used by various library functions that perform locale-dependent actions such as translating messages (which uses LC_MESSAGES), formatting numbers (LC_NUMERIC) and dates (LC_TIME, encoding and decoding text (LC_CTYPE), sorting text (LC_COLLATE), etc.

Take for example a function that formats a date. If asked to use a locale-dependent format, it will look up the rules for date formatting in the locale that was configured for use in the current process. The date formatting doesn't care about a locale name, what it needs to know is how to format the date. So while it does look in what you could call a “locale table”, this table doesn't contain names (e.g. LC_TIME is fr_FR) but settings (e.g. “the short date format uses the order day-month-year, the long date format uses the month names janvier, février, …”).

The C function setlocale fills some entries in the process's locale settings table. It takes two arguments: a category to fill, and a string which is a name given to a particular value for these locale settings. The string is basically a file name to load the settings from. For example, setlocale(LC_TIME, "fr_FR") basically means “load date formatting settings into the process's locale table from the file /usr/share/i18n/locales/fr_FR” (it's more complicated than that, other files are involved, but that's the basic idea).

The C function setlocale has a mode of operation where it will look up environment variables. If you give it an empty string instead of a name, it will determine a locale name based on the locale environment variable hierarchy. This mode is what most programs use. Once again, the environment variables and the locale names influence how setlocale works, not how functions that perform locale-dependent actions work.

The locale settings table is a feature of the standard library (libc) which almost all programs are linked against (regardless of which language(s) they're written in). Most languages provide a way to set it by calling the standard library's setlocale function. For example, Perl and Python both have a setlocale function which resembles C's. High-level languages also typically have a way to set locale settings based on the environment, for example use locale in Perl; in bash it's automatic but the locale settings are not based on the environment but on the shell variables of the same name (so setting e.g. LC_COLLATE has an effect in bash even if you don't export it).

  • "The locale settings table is a feature of the standard library (libc) which almost all programs are linked again (regardless of which language(s) they're written in)" What if I wrote a program in Assembly, and created the executable without linking against the CRT, would this program have a locale variables table when launched? – James Jun 12 '18 at 08:54
  • @James If you did that then you wouldn't have any locale settings table, unless you implemented one in your own code. But you wouldn't have any function that would use locale settings anyway (one again, unless you implemented one in your own code). – Gilles 'SO- stop being evil' Jun 12 '18 at 08:57
  • The execve() function documentation says the following: "The equivalent of setlocale(LC_ALL, "C") is executed at program start-up", now what if execve() was passed an executable made in Assembly (without a locale variables table) as argument, will execve() still modify the memory of this executable to change the LC_ALL locale variable value (which doesn't even exist since the executable doesn't have a locale variables table as I have said)? – James Jun 12 '18 at 12:05
  • @James Once again there is no such thing as a “LC_ALL locale variable” or a “locale variables table”. There is a locale settings table in the libc data. There may or may not be an environment variable called LC_ALL; this doesn't affect the behavior of execve. What the sentence you quoted means is that the default locale settings of a program are the “C” settings. execve doesn't modify the memory of the executable: what happens is that the locale settings table's initial values contain the “C” settings. If you don't link with libc, then your program doesn't even have locale settings. – Gilles 'SO- stop being evil' Jun 12 '18 at 12:26