4

On macOS, with bash installed from homebrew, I noticed that setting LC_MESSAGES seems to have some effect on the current shell's locale settings, but messages doesn't actually change until LC_MESSAGES is exported:

Unsetting LANG and LC_MESSAGES, I get an English error message as expected:

bash-4.4$ unset LANG LC_MESSAGES
bash-4.4$ if :;  fi
bash: syntax error near unexpected token `fi'

Setting LC_MESSAGES to an incorrect value gives an error about setlocale:

bash-4.4$ LC_MESSAGES=foo
bash: warning: setlocale: LC_MESSAGES: cannot change locale (foo): No such file or directory

So something changes when I set LC_MESSAGES. But setting it to a reasonable value has no effect:

bash-4.4$ LC_MESSAGES=ja_JP.UTF-8
bash-4.4$ if :;  fi
bash: syntax error near unexpected token `fi'

Until I export it:

bash-4.4$ export LC_MESSAGES
bash-4.4$ if :;  fi
bash: 予期しないトークン `fi' 周辺に構文エラーがあります

(All of this goes for LANG as well, it seems.)

The Bash manual's section on Bash Variables does not say LC_MESSAGES or LANG has to be exported (and most other variables listed there don't have to be exported).

Why is this?

  • 3
  • @Scott thanks, but what am I supposed to see there? As I said, the documentation does not say the variable has to be exported, and setting an unexported LC_MESSAGES does have some effect. How does that post explain either of these observations? –  Nov 16 '18 at 07:00
  • When you set a plain, ordinary, local shell variable, the shell just puts it into memory. The LANG and LC_* variables are a special case; they get validated when assigned in a way that no (few?) other variables do. But the point is that variables don’t become visible (especially to external programs) and effective until they are exported, and transformed from shell variables into environment variables. … (Cont’d) – Scott - Слава Україні Nov 16 '18 at 07:29
  • (Cont’d) … The fact that you need to do this to get bash itself to honor them is a little odd, but the explanation is probably that they are using library code (suitable for running in any program), and that library code calls other library functions like getenv, and there’s no mechanism for short-circuiting them to look for a *shell variable* called LC_MESSAGES when no environment variable by that name exists. – Scott - Слава Україні Nov 16 '18 at 07:29
  • While I can't reproduce with 4.4 on GNU/Linux (changes taken into account when exported or not), I can on FreeBSD. – Stéphane Chazelas Nov 16 '18 at 07:30
  • @Scott, that library should use setlocale(), not getenv() to query the current locale. bash as shown by the OP does call setlocale() when the LC_* variables are modified to set the current locale, so it seems there's something wrong with that library or possibly the way bash uses it. – Stéphane Chazelas Nov 16 '18 at 07:40
  • @Scott But this isn't simple validation, though. setlocale is being called, and AFAIK that function doesn't have an option to just validate but not set. –  Nov 16 '18 at 07:40
  • 1
    @StéphaneChazelas Bash's internal gettext library switches to calling getenv() if setlocale() is not available. It is available on macOS though... but maybe the Homebrew build scripts are't picking it up, or are ignoring it for whatever reason. – Kusalananda Nov 16 '18 at 08:11
  • 1
    @Kusalananda, I can reproduce on FreeBSD with a simple program that calls setlocale("fr_FR.UTF-8", LC_ALL) followed by dgettext(). The messages are not translated into the language passed to setlocale() (though truss shows the corresponding .mo file is being open if there's no LC_* variable in the environment), but instead into the language referenced by the LC_* variables. One would need to look at the source to see why, but that very much looks like a bug. – Stéphane Chazelas Nov 16 '18 at 08:21
  • @StéphaneChazelas do you also see in the other *BSDs? OpenBSD, NetBSD? –  Nov 16 '18 at 08:23
  • 1
    I've only tested in FreeBSD (most of macOS is based on FreeBSD, though I can't tell if that applies to that particular case). – Stéphane Chazelas Nov 16 '18 at 08:25
  • 1
    @JohnDoea Similar behaviour on OpenBSD. – Kusalananda Nov 16 '18 at 08:31
  • 1
    @Scott can you remove your duplicate flag now? As the answer shows, this goes well beyond simple exporting of variables. –  Nov 19 '18 at 14:22
  • Funny, I'm still getting bash: syntax error near unexpected tokenfi'even after exporting and no Japanese text.locale` however reflects the change. Anyone knows anything? – Pacerier May 29 '19 at 10:31

2 Answers2

2

You're right that assigning the LC_* shell variables does cause bash to call POSIX setlocale() for the corresponding category with the value of the variable whether they're exported or not. For LANG, it calls setlocale(LC_ALL, thevalue) followed setlocale(LC_*) again for all the LC_* variable. For LANGUAGE, it doesn't do anything.

Now, bash is the shell of the GNU project. For localization of text, it uses GNU gettext, also known as libintl. It even comes with its own version bundled with the source which you can compile in bash if you call the configure script with --with-included-gettext.

gettext looks up message translations in a per-language database. Which language it is is determined by the value of LC_MESSAGES category though can be overridden by the $LANGUAGE environment variable.

According to the gettext documentation, the previous call to setlocale() should be the one that determines the value for the category, but there are some complications:

For multithreaded applications, there is currently no standard API that gettext can use to retrieve that value. bash is not a multithreaded application, but even what setlocale(category, NULL) returns is implementation defined and in practice not always usable.

So in practice, gettext only uses setlocale() to retrieve the language name when built as part of the GNU libc or on a system where the libc is the GNU libc (like the one built with bash with --with-included-gettext on a GNU system) because it knows it can rely on it.

On other systems, it uses getenv() to determine the locale, irrespective of how setlocale() was invoked earlier, which is why you're seeing that behaviour.

Exporting those variables is an easy work around. One could argue that if they're not exported, they're not part of the environment anyway. POSIX is not very clear on that. Another way to look at it is that the translation is not done by bash, but by a third party mechanism, so just like when executing other commands, we need to use environment variables to pass the locale information between the two software (here bash and gettext).

Now, on GNU systems, it actually gets worse.

As seen above, gettext is included in the GNU libc. $LANGUAGE takes precedence over $LC_MESSAGE but $LANGUAGE is not part of the POSIX locale API, that's an extension on top of it.

So while on a GNU system, gettext will use setlocale(LC_MESSAGES, NULL) to get the name for the LC_MESSAGES category, for LANGUAGE, it always uses getenv(), LANGUAGE is not a locale category.

The problem is that bash manages the environment by itself as part of its variable handling, disconnected from the libc's environ[] array. It does have its own getenv() which does query its own version of the environment, but when gettext is built as part of the libc, and bash is dynamically linked dgettext() calls the getenv() from the libc as that's an internal call within the libc, not bash's one, so will only get the $LANGUAGE value from the time bash was started.

So on GNU systems, unless bash was linked statically or built with --with-included-gettext, any change to $LANGUAGE will be ignored for the messages generated by bash, whether the variable is exported or not. On other systems, that's fine (as long as $LANGUAGE is exported) as gettext is not part of the libc, so it does call bash's getenv().

On Debian:

$ LANGUAGE=fr bash -c 'LANGUAGE=es; eval fi'
bash: eval: ligne 0: erreur de syntaxe près du symbole inattendu « fi »
bash: eval: ligne 0: `fi'

(message in French, the value of $LANGUAGE at the time bash was invoked, not Spanish).

Actually it's not much better with other shells.

zsh is not translated to other languages but does use strerror() which does use gettext internally on GNU systems:

$ LANGUAGE=fr zsh -c 'LANGUAGE=es; true</x; LANGUAGE=en; true</a; true < /etc/shadow'
zsh:1: no existe el archivo o el directorio: /x
zsh:1: no existe el archivo o el directorio: /a
zsh:1: permission denied: /etc/shadow

The LANGUAGE=es was honoured but see how the second message for ENOENT has not been displayed in English (presumably cached by gettext somehow; that cache should have been invalidated when $LANGUAGE changed but that was not the case).

0

Take a look at this answer for an explaination of the difference between a shell variable and an environment variable. In essence:

Setting a shell variable:

LANG=en_US.UTF-8

Setting an environment variable:

export LANG=en_US.UTF-8

You want to set environment variables for locale, as shell variables are private to the shell and won’t be passed to child processes.

  • 1
    That's different here. export is only needed to affect the behaviour of commands being executed. Here, we're talking of messages generated by the shell internally. – Stéphane Chazelas Nov 16 '18 at 07:29
  • There are no child processes here, though. That's why I specifically used this if :; fi command, which generates a syntax error from the shell itself. –  Nov 16 '18 at 07:36
  • It is the assignment that creates the problem. Assignment of a variable is a command that gets forked off. The setlocale child process doesn’t see the shell variable assignment and told bash that it didn’t work. Bash reported the assignment didn’t work, and told you what your attempted assignment was. – Richard Barber Nov 16 '18 at 08:51
  • That ... makes no sense. Simple variable assignment doesn't fork off a new process in any shell that I know of. And as for setlocale telling Bash that it didn't work, there are two assignments here, and setlocale only complained about one of them. Going by your logic, it should have complained about both. –  Nov 16 '18 at 09:14
  • That's not "simple" variable assignment, though, is it? You're running that in the background. –  Nov 16 '18 at 09:23
  • My point was supposed to be that processes run when you assign a LC_ variable to either the shell or environment. – Richard Barber Nov 16 '18 at 09:48