32

I am always really hesitant to mess around with $IFS because it's clobbering a global.

But often it makes loading strings into a bash array nice and concise, and for bash scripting, conciseness is hard to come by.

So I figure it might be better than nothing if I try to "save" the starting contents of $IFS to another variable and then restore it immediately after i am done using $IFS for something.

Is this practical? Or is it essentially pointless and I should just directly set IFS back to whatever it needs to be for its subsequent uses?

Steven Lu
  • 2,282
  • Why wouldn't it be practical? – Bratchley Feb 22 '16 at 03:02
  • 1
    Because unsetting IFS would do the job fine. – llua Feb 22 '16 at 03:04
  • 6
    For those saying that unsetting IFS will work fine, keep in mind that it is situational: https://stackoverflow.com/questions/39545837/unset-ifs-unexpected-behaviour. In my experience, it's best to set IFS manually to the default for your shell interpreter, namely $' \t\n' if you're using bash. unset $IFS simply doesn't always restore it to what you'd expect to be the default. – Darrel Holt Jun 28 '19 at 17:57
  • @llua That is dangerously incorrect! I've read this SE question and used unset since so many people said it would be okay and it ended up causing an autocomplete command to malfunction which broke the terminal session because multiple environment variables were messed up. Please refrain from stating what you think as facts when don't know exactly what you're talking about. You have a high reputation on this platform and people will put a lot of trust in what you say. In this case, it could lead to extremely hard to debug and find bugs. Just save it and restore it as good style commands. – Stefan Fabian Jan 15 '21 at 13:41
  • 1
    @StefanFabian the statement was in the context of behavior from the shell, i am willing to bet that the completer (written for the shell) in question that caused that problem attempted to save IFS and "restored" it latter, setting it to an empty string. which is why assuming IFS is set is not safe. – llua Jan 15 '21 at 22:46
  • @llua well the only reason IFS is not set is that you tell people to unset it. The two approaches just don't mix. I agree that it would be best to check whether it is set first before backing it up but some scripts don't do that. Backing up and resetting shared variables when you modify them is an old and trusted practice. Also, this approach works always, whereas unset as you've just admitted has cases where it will not work the same as backup and reset would. – Stefan Fabian Jan 18 '21 at 13:11
  • @StefanFabian it does /not/ sometimes not work, nor did i admit that. It /can/ cause a separate issue, but those scripts making inaccurate assumptions are not isolated to changes to IFS's state. While not in the novel format that this site seems to prefer, the answer by barefoot (or sls) should've been accepted since it also points out the problem with blindly saving the variable. – llua Jan 19 '21 at 20:33
  • @llua I said it sometimes does not work the same as backup and reset would and you did admit that. Since if you unset it any code that uses backup and reset will break the IFS. Yes, you can say that they broke it but they wouldn't have broken it if you didn't unset it so in that case you are both to blame. Should they check whether it was unset before? Yes! But they often don't do that, so the safe route is to just save and reset the IFS using the approach in this answer where the case that the IFS is unset is also handled. – Stefan Fabian Jan 20 '21 at 12:28
  • @StefanFabian unsetting IFS is a valid way of returning the word splitting of bash back to normal. "backing up" and "restoring" IFS poorly unfortunately can lead to problems. accounting for IFS being unset when doing so still can still lead to problems if something else cobblers the "backup" variable name in the block of code that changes it. – llua Jan 20 '21 at 21:21

4 Answers4

22

In general, it is a good practice to return conditions to default.

However, in this case, not so much.

Why?:

Also, storing IFS value has a problem.
If the original IFS was unset, the code IFS="$OldIFS" will set IFS to "", not unset it.

To actually keep the value of IFS (even if unset), use this:

${IFS+"false"} && unset oldifs || oldifs="$IFS"    # correctly store IFS.

IFS="error" ### change and use IFS as needed.

${oldifs+"false"} && unset IFS || IFS="$oldifs" # restore IFS.

llua
  • 6,900
  • Beware that in bash, unset IFS fails to unset IFS if it had been declared local in a parent context (function context) and not in the current context. – Stéphane Chazelas Jul 31 '18 at 14:03
12

You can save and assign to IFS as needed. There is nothing wrong with doing so. It's not uncommon to save its value for restoration subsequent to a temporary, expeditious modification, like your array assignment example.

As @llua mentions in his comment to your question, simply unsetting IFS will restore the default behavior, equivalent to assigning a space-tab-newline.

It's worth considering how it can be more problematic to not explicitly set/unset IFS than it is to do so.

From the POSIX 2013 edition, 2.5.3 Shell Variables:

Implementations may ignore the value of IFS in the environment, or the absence of IFS from the environment, at the time the shell is invoked, in which case the shell shall set IFS to <space> <tab> <newline> when it is invoked.

A POSIX-compliant, invoked shell may or may not inherit IFS from its environment. From this follows:

  • A portable script cannot dependably inherit IFS via the environment.
  • A script that intends to use only the default splitting behavior (or joining, in the case of "$*"), but which may run under a shell which initializes IFS from the environment, must explicitly set/unset IFS to defend itself against environmental intrusion.

N.B. It is important to understand that for this discussion the word "invoked" has a particular meaning. A shell is invoked only when it is explicitly called using its name (including a #!/path/to/shell shebang). A subshell -- such as might be created by $(...) or cmd1 || cmd2 & -- is not an invoked shell, and its IFS (along with most of its execution environment) is identical to its parent's. An invoked shell sets the value of $ to its pid, while subshells inherit it.


This is not merely a pedantic disquisition; there is actual divergence in this area. Here is a brief script which tests the scenario using several different shells. It exports a modified IFS (set to :) to an invoked shell which then prints its default IFS.

$ cat export-IFS.sh
export IFS=:
for sh in bash ksh93 mksh dash busybox:sh; do
    printf '\n%s\n' "$sh"
    $sh -c 'printf %s "$IFS"' | hexdump -C
done

IFS is not generally marked for export, but, if it were, note how bash, ksh93, and mksh ignore their environment's IFS=:, while dash and busybox honor it.

$ sh export-IFS.sh

bash
00000000  20 09 0a                                          | ..|
00000003

ksh93
00000000  20 09 0a                                          | ..|
00000003

mksh
00000000  20 09 0a                                          | ..|
00000003

dash
00000000  3a                                                |:|
00000001

busybox:sh
00000000  3a                                                |:|
00000001

Some version info:

bash: GNU bash, version 4.3.11(1)-release
ksh93: sh (AT&T Research) 93u+ 2012-08-01
mksh: KSH_VERSION='@(#)MIRBSD KSH R46 2013/05/02'
dash: 0.5.7
busybox: BusyBox v1.21.1

Even though bash, ksh93, and mksh do not initialize IFS from the environment, they re-export their modified IFS.

If for whatever reason you need to portably pass IFS via the environment, you cannot do so using IFS itself; you will need to assign the value to a different variable and mark that variable for export. Children will then need to explicitly assign that value to their IFS.

Barefoot IO
  • 1,946
  • I see, so if I may paraphrase, it is arguably more portable to explicitly specify the IFS value in most situations where it is to be used, and so it often isn't terribly productive to even attempt to "preserve" its original value. – Steven Lu Feb 22 '16 at 22:41
  • 3
    The paramount issue is that if your script uses IFS, it should explicitly set/unset IFS to ensure that its value is what you want it to be. Typically, your script's behavior depends on IFS if there are any unquoted parameter expansions, unquoted command substitutions, unquoted arithmetic expansions, reads, or double-quoted references to $*. That list is just off the top of my head, so it may not be comprehensive (especially when considering the POSIX-extensions of modern shells). – Barefoot IO Feb 28 '16 at 18:43
10

You are right to be hesitant about clobbering a global. Fear not, it is possible to write clean working code without ever modifying the actual global IFS, or doing a cumbersome and error-prone save/restore dance.

You can:

  • set IFS for a single invocation:

    IFS=value command_or_function
    

    or

  • set IFS inside a subshell:

    (IFS=value; statement)
    $(IFS=value; statement)
    

Examples

  • To obtain a comma-delimited string from an array:

    str="$(IFS=, ; echo "${array[*]-}")"
    

    Note: The - is only to protect an empty array against set -u by providing a default value when unset (that value being the empty string in this case) .

    The IFS modification is only applicable inside the subshell spawned by the $() command substitution. This is because subshells have copies of the invoking shell's variables and can therefore read their values, but any modifications performed by the subshell only affect the subshell's copy and not the parent's variable.

    You might also be thinking: why not skip the subshell and just do this:

    IFS=, str="${array[*]-}"  # Don't do this!
    

    There is no command invocation here, and this line is instead interpreted as two independent subsequent variable assignments, as if it were:

    IFS=,                     # Oops, global IFS was modified
    str="${array[*]-}"
    

    Finally, let's explain why this variant will not work:

    # Notice missing ';' before echo
    str="$(IFS=, echo "${array[*]-}")" # Don't do this! 
    

    The echo command will indeed be called with its IFS variable set to ,, but echo does not care or use IFS. The magic of expanding "${array[*]}" to a string is done by the (sub-)shell itself before echo is even invoked.

  • To read in a whole file (that does not contain NULL bytes) into a single variable named VAR:

    IFS= read -r -d '' VAR < "${filepath}"
    

    Note: IFS= is the same as IFS="" and IFS='', all of which set IFS to the empty string, which is very different from unset IFS: if IFS is not set, behavior of all bash functionalities that internally use IFS is exactly the same as if IFS had the default value of $' \t\n'.

    Setting IFS to the empty string ensures leading and trailing whitespace is preserved.

    The -d '' or -d "" tells read to only stop its current invocation on a NULL byte, instead of the usual newline.

  • To split $PATH along its : delimiters:

    IFS=":" read -r -d '' -a paths <<< "$PATH"
    

    This example is purely illustrative. In the general case where you are splitting along a delimiter, it may be possible for the individual fields to contain (an escaped version of) that delimiter. Think of trying to read-in a row of a .csv file whose columns may themselves contain commas (escaped or quoted in some way). The above snippet will not work as intended for such cases.

    That said, you are unlikely to encounter such :-containing-paths within $PATH. While UNIX/Linux pathnames are allowed to contain a :, it seems bash wouldn't be able to handle such paths anyway if you try to add them to your $PATH and store executable files in them, as there is no code to parse escaped/quoted colons: source code of bash 4.4.

    Finally, note that the snippet appends a trailing newline to the last element of the resulting array (as called out by @StéphaneChazelas in now-deleted comments), and that if the input is the empty string, the output will be a single-element array, where the element will consist of a newline ($'\n').

Motivation

The basic old_IFS="${IFS}"; command; IFS="${old_IFS}" approach that touches the global IFS will work as expected for the simplest of scripts. However, as soon as you add any complexity, it can easily break apart and cause subtle issues:

  • If command is a bash function that also modifies the global IFS (either directly or, hidden from view, inside yet another function that it calls), and while doing so mistakenly uses the same global old_IFS variable to do the save/restore, you get a bug.
  • As pointed out in this comment by @Gilles, if the original state of IFS was unset, the naive save-and-restore won't work, and will even result in outright failures if the commonly (mis-)used set -u (a.k.a set -o nounset) shell option is in force.
  • It is possible for some shell code to execute asynchronously to the main execution flow, such as with signal handlers (see help trap). If that code also modifies the global IFS or assumes it has a particular value, you can get subtle bugs.

You could devise a more robust save/restore sequence (such as the one proposed in this other answer to avoid some or all of these problems. However, you would have to repeat that piece of noisy boilerplate code wherever you temporarily need a custom IFS. This reduces code readability and maintainability.

Additional considerations for library-like scripts

IFS is especially a concern for authors of shell function libraries who need to ensure their code works robustly regardless of the global state (IFS, shell options, ...) imposed by their invokers, and also without disturbing that state at all (the invokers might rely on it to always remain static).

When writing library code, you cannot rely on IFS having any particular value (not even the default one) or even being set at all. Instead, you need to explicitly set IFS for any snippet whose behavior depends on IFS.

If IFS is explicitly set to the necessary value (even if that happens to be the default one) in every line of code where the value matters using whichever of the two mechanisms described in this answer is appropriate to localize the effect, then the code is both independent of global state and avoids clobberring it altogether. This approach has the added benefit of making it very explicit to a person reading the script that IFS matters for precisely this one command/expansion at minimum textual cost (compared to even the most basic save/restore).

What code is affected by IFS anyway?

Fortunately, there are not that many scenarios where IFS matters (assuming you always quote your expansions):

  • "$*" and "${array[*]}" expansions
  • invocations of the read built-in targeting multiple variables (read VAR1 VAR2 VAR3) or an array variable (read -a ARRAY_VAR_NAME)
  • invocations of read targeting a single variable when it comes to leading/trailing whitespace or non-whitespace characters appearing in IFS.
  • word-splitting (such as for unquoted expansions, which you might want to avoid like the plague)
  • some other less common scenarios (See: IFS @ Greg's Wiki)
sls
  • 351
  • 3
  • 5
  • I can't say I understand the To split $PATH along its : delimiters assuming none of the components contain a : themselves sentence. How could the components contain : when : is the delimiter? – Stéphane Chazelas Jul 31 '18 at 14:29
  • 1
    @StéphaneChazelas Well, : is a valid character to use in a filename on most UNIX/Linux filesystems, so it is entirely possible to have a directory with a name containing :. Perhaps some shells have a provision to escape : in PATH by using something like \:, and then you would see columns appearing that are not actual delimiters (It seems bash does not allow such escaping. The low-level function used when iterating over $PATH just searches for : in a C string: http://git.savannah.gnu.org/cgit/bash.git/tree/general.c#n891 ). – sls Jul 31 '18 at 16:34
  • I revised the answer to hopefully make the splitting $PATH example along : more clear. – sls Jul 31 '18 at 16:48
  • 3
    Welcome to SO! Thanks for such an in depth answer :) – Steven Lu Jul 31 '18 at 17:14
1

Is this practical? Or is it essentially pointless and I should just directly set IFS back to whatever it needs to be for its subsequent uses?

Why risk a typo setting IFS to $' \t\n' when all you have to do is

OIFS=$IFS
do_your_thing
IFS=$OIFS

Alternatively, you can call a subshell if you don't need any variables set/modified within:

( IFS=:; do_your_thing; )
arielCo
  • 1,058