The basic idea is that VAR=VALUE some-command sets VAR to VALUE for the execution of some-command when some-command is an external command, and it doesn't get more fancy than that. If you combine this intuition with some knowledge of how a shell works, you should come up with the right answer in most cases. The POSIX reference is “Simple Commands” in the chapter “Shell Command Language”.
If some-command is an external command, VAR=VALUE some-command is equivalent to env VAR=VALUE some-command. VAR is exported in the environment of some-command, and its value (or lack of a value) in the shell doesn't change.
If some-command is a function, then VAR=VALUE some-command is equivalent to VAR=VALUE; some-command, i.e. the assignment remains in place after the function has returned, and the variable is not exported into the environment. The reason for that has to do with the design of the Bourne shell (and subsequently with backward compatibility): it had no facility to save and restore variable values around the execution of a function. Not exporting the variable makes sense since a function executes in the shell itself. However, ksh (including both ATT ksh93 and pdksh/mksh), bash and zsh implement the more useful behavior where VAR is set only during the execution of the function (it's also exported). In ksh, this is done if the function is defined with the ksh syntax function NAME …, not if it's defined with the standard syntax NAME (). In bash, this is done only in bash mode, not in POSIX mode (when run with POSIXLY_CORRECT=1). In zsh, this is done if the posix_builtins option is not set; this option is not set by default but is turned on by emulate sh or emulate ksh.
If some-command is a builtin, the behavior depends on the type of builtin. Special builtins behave like functions. Special built-ins are the ones that have to be implemented inside the shell because they affect the state shell (e.g. break affects control flow, cd affects the current directory, set affects positional parameters and options…). Other builtins are built-in only for performance and convenience (mostly — e.g. the bash feature printf -v can only be implemented by a builtin), and they behave like an external command.
The assignment takes place after alias expansion, so if some-command is an alias, expand it first to find what happens.
Note that in all cases, the assignment is performed after the command line is parsed, including any variable substitution on the command line itself. So var=a; var=b echo $var prints a, because $var is evaluated before the assignment takes place. And thus IFS=. printf "%s\n" $var uses the old IFS value to split $var.
I've covered all the types of commands, but there's one more case: when there is no command to execute, i.e. if the command consists only of assignments (and possibly redirections). In that case, the assignment remains in place. VAR=VALUE OTHERVAR=OTHERVALUE is equivalent to VAR=VALUE; OTHERVAR=OTHERVALUE. So after IFS=. arr=($var), IFS remains set to .. Since you could use $IFS in the assignment to arr with the expectation that it already has its new value, it makes sense that the new value of IFS is used for the expansion of $var.
In summary, you can use IFS for temporary field splitting only:
- by starting a new shell or a subshell (e.g.
third=$(IFS=.; set -f; set -- $var; echo "$3") is a complicated way of doing third=${var#*.*.} except that they behave differently when the value of var contains less than two . characters);
- in ksh, with
IFS=. some-function where some-function is defined with the ksh syntax function some-function …;
- in bash and zsh, with
IFS=. some-function as long as they are operating in native mode as opposed to compatibility mode.
IFSremains set to." Eek. After reading the first part, that makes sense, but before I posted this Q, I would not have expected that. – muru Feb 20 '16 at 21:46