28

How do I correctly run a few commands with an altered value of the IFS variable (to change the way field splitting works and how "$*" is handled), and then restore the original value of IFS?

I know I can do

(
    IFS='my value here'
    my-commands here
)

to localize the change of IFS to the sub-shell, but I don't really want to start a sub-shell, especially not if I need to change or set the values of variables that needs to be visible outside of the sub-shell.

I know I can use

saved_IFS=$IFS; IFS='my value here'
my-commands here
IFS=$saved_IFS

but that seems to not restore IFS correctly in the case that the original IFS was actually unset.

Looking for answers that are shell agnostic (but POSIX).

Clarification: That last line above means that I'm not interested in a bash-exclusive solution. In fact, the system I'm using most, OpenBSD, does not even come with bash installed at all by default, and bash is not a shell I use for anything much other than to answer questions on this site. It's much more interesting to see solutions that I may use in bash or other POSIX-like shells without making an effort to write non-portable code.

Kusalananda
  • 333,661
  • If you were looking for bash-only answers, I would suggest saving and later evaling the output of declare -p IFS. – Charles Duffy Mar 19 '21 at 19:13
  • 2
    @CharlesDuffy, in bash like in all shells with scoping, you'd rather use local (though it works best with shells with static scoping or with zsh's private instead (not that you'd use $IFS in zsh)) . The output of bash's declare -p is not always safe for evaling. – Stéphane Chazelas Mar 19 '21 at 19:20
  • ...it's not? I'm surprised about that, would have expected values to be printf %q'd or equivalent. Have a reference? – Charles Duffy Mar 19 '21 at 19:23
  • 2
    @CharlesDuffy see Escape a variable for use as content of another script. IOW, I wouldn't use eval on anything that has been quoted with anything other than the single-quote based approaches there (and even then, it's best to avoid evaling arbitrary data if that can be avoided) – Stéphane Chazelas Mar 19 '21 at 19:35
  • 1
    @StéphaneChazelas, mm, I have a hard time telling how that answer (to "Escape a variable for use as content of another script") would tell why declare -p within a single Bash script would be a problem? It seems to focus on differences between shells, and mentions a number of different ways for producing quoted versions of a variable, so it's rather hard to pick up what issue you're referring to. – ilkkachu Mar 20 '21 at 09:53
  • @CharlesDuffy, anyway, declare -p IFS in itself doesn't work if IFS is unset. Then, declare -p just errors out with "-bash: declare: IFS: not found". Instead of e.g. printing unset IFS. – ilkkachu Mar 20 '21 at 09:57
  • Right; but that makes the output a distinguishable state. unset IFS before the eval and you're fine. – Charles Duffy Mar 20 '21 at 13:38
  • 1
    @CharlesDuffy, yes, just still means that the unset case needs special treatment with declare too. A bit like with the unset IFS [ -n "${save+set}" ] && IFS=$save; case below (it's exactly the same workaround of course, since in the other direction you can just declare -p IFS 2> /dev/null) – ilkkachu Mar 20 '21 at 13:47
  • 1
    If it’s only a Single command you can probably also use IFS=“Xy“ command – eckes Mar 20 '21 at 17:13
  • 1
    @ilkkachu, here the output of declare -p is not safe to use in a different locale from that where it was generated, like when the part in between the saving and restoring changes the value of LC_*/LANG... variables. It's also unsafe for some values of $IFS in older versions of bash in some locales. Also note that beside the unset issue, it can't be used in functions as declare would make IFS local upon restore. It also won't restore the type to scalar if IFS been set to array or hash in between. IOW, it has no advantage over safer approaches. – Stéphane Chazelas Mar 21 '21 at 06:42

6 Answers6

30

Yes, in the case when IFS is unset, restoring the value from $saved_IFS would actually set the value of IFS (to an empty value).

This would affect the way field splitting of unquoted expansions is done, it would affect field splitting for the read built-in utility, and it would affect the way the positional parameters are combined into a string when using "$*".

With an unset IFS these things would happen as if IFS had the value of a space, a tab character, and a newline character, but with an empty value, there would be no field splitting and the positional parameters would be concatenated into a string with no delimiter when using "$*". So, there's a difference.

To correctly restore IFS, consider setting saved_IFS only if IFS is actually set to something.

unset saved_IFS
[ -n "${IFS+set}" ] && saved_IFS=$IFS

The parameter substitution ${IFS+set} expands to the string set only if IFS is set, even if it is set to an empty string. If IFS is unset, it expands to an empty string, which means that the -n test would be false and saved_IFS would remain unset.

Now, saved_IFS is unset if IFS was initially unset, or it has the value that IFS had, and you can set whatever value you want for IFS and run your code.

When restoring IFS, you do a similar thing:

unset IFS
[ -n "${saved_IFS+set}" ] && { IFS=$saved_IFS; unset saved_IFS; }

The final unset saved_IFS isn't really necessary, but it may be good to clean up old variables from the environment.


An alternative way of doing this, suggested by LL3 in comments (now deleted), relies on prefixing the unset command by :, a built-in utility that does nothing, effectively commenting out the unset, when it's not needed:

saved_IFS=$IFS
${IFS+':'} unset saved_IFS

This sets saved_IFS to the value of $IFS, but then unsets it if IFS was unset.

Then set IFS to your value and run you commands. Then restore with

IFS=$saved_IFS
${saved_IFS+':'} unset IFS

(possibly followed by unset saved_IFS if you want to clean up that variable too).

Note that : must be quoted, as above, or escaped as \:, so that it isn't modified by $IFS containing : (the unquoted parameter substitution invokes field splitting, after all).

Kusalananda
  • 333,661
  • 5
    Note that those kinds of approaches are not re-entrant in that for instance, in between the setting and restoring, you can't call a function that uses the same approach. – Stéphane Chazelas Mar 19 '21 at 18:55
  • 2
    Your $IFS+: approach reminds me of https://groups.google.com/g/comp.unix.shell/c/25QYE-0toQA/m/uFy1F0lEamAJ :-) – Stéphane Chazelas Mar 19 '21 at 19:04
  • 1
    https://groups.google.com/g/comp.unix.shell/c/00mMle2zpgc/m/L9D42gpVg8QJ is probably where it was invented. You'll notice Laura Fairhead participated in that thread who coined a few shell idiom pearls. – Stéphane Chazelas Mar 19 '21 at 19:12
  • 2
    The change from ${IFS+:} to ${IFS:+':'} would have been as a work around for older versions of zsh, where in sh emulation ${IFS+:} would have expanded to two empty strings if $IFS contained : (: undergoing IFS-splitting) – Stéphane Chazelas Mar 19 '21 at 19:17
7

Inside a bash function, you can use local IFS=$'\n' or whatever to shadow the global (or parent function's local) value of IFS while inside the scope of this function. Further assignment to IFS will still be modifying your local version.

In bash,

It is an error to use local when not within a function.

So this doesn't help if you're not writing a function, or using a shell without local (or equivalent), but if you are (and you know IFS values you wants at all points until it returns), there is an easy and good solution.

A function doesn't involve a subshell as long as you define it with
foo(){ ...; } instead of foo() ( ... ).

Peter Cordes
  • 6,466
  • 2
    local isn't POSIX, but Bash/Dash/Busybox do have it. Ksh is a problem here, though. – ilkkachu Mar 20 '21 at 10:01
  • @ilkkachu: Oh, I missed the part of the question that was asking for shell-agnostic / POSIX. Even so, I wanted to post for future readers who come across this question without that limitation, because it's enough nicer that it's worth knowing about. – Peter Cordes Mar 20 '21 at 10:12
  • oh I think local is an excellent solution, when possible and available. At first, I also thought it was quite portable even, since Dash and Busybox support it too. But then remembered ksh treats them differently. With Bash and others, nested functions may be an issue too, since a function called from one setting local IFS will see the "local" value set in the upper function (e.g. a="main"; bar() { echo "$a"; }; foo() { local a="foo"; bar; }; foo prints foo, not main. That's the part that's different in ksh, plus you need function foo { ... } there.) So, something to note at least. – ilkkachu Mar 20 '21 at 11:18
  • -1 If you have the option of writing a bash script, you also have the option of using a scripting language less terrible than shell. – zwol Mar 21 '21 at 00:58
  • @zwol: you can use functions interactively, like foo(){ IFS=... ; your code here; }; foo. I actually do this sometimes if I want to be able to up-arrow / edit, with the thing I want to edit being a function arg without having to move the cursor far back to where its embedded. (But the function body I want can also change less frequently, so I don't just put it in a file. In my specific use-case, I have a wrapper function around mpv to play 30 minutes or one file, or a couple files, of an audiobook that's either one big file, or one file per CD, or w/e. I listen while falling asleep) – Peter Cordes Mar 21 '21 at 02:00
  • 1
    @zwol, yeah, that argument could be used for about half the questions and answers on this site. Also, given that checkbashisms exists, not every script author seems to have gotten that memo. (sure, it's better nowadays, but, still.) – ilkkachu Mar 21 '21 at 08:34
  • @zwol Are you saying that if a question specifically asks how to do something using a shell script – no, let’s go further, this question specifically asks how to do something that only makes sense in a shell – we should downvote answers that involve shell scripts? – Brian Drake Mar 21 '21 at 08:39
  • 3
    @BrianDrake: No, zwol seems to be arguing that the only reason to write a shell script is portability, which means only using POSIX sh features. (And only features that aren't known to be buggy on any important shells, see zwol's answer). With that mindset, there's never a reason to write bash-only scripts. (This is of course flawed logic; e.g. autocomplete scripts are very shell-specific, and for performance and other reasons are written in the shell's own language.) – Peter Cordes Mar 21 '21 at 08:45
  • @zwol The shell is not a terrible scripting language for getting simple things done. It's a terrible generic language. My approach to shell scripting is to use POSIX syntax until a specific shell's features clearly make things easier. In this simple case of saving and restoring a shell variable, I fail to see that any shell has an exclusive feature that makes this much easier than any other POSIX-like shell. This is partly why I'm asking for a POSIX solution. Having said that, using local is a good solution, albeit not POSIX. local is often implemented, in one way or another. – Kusalananda Mar 21 '21 at 12:13
  • (cont.) The issue with local is that since it's not POSIX, one has to also mention how it can be used and not used. What are the failure conditions? What corner cases does this not work in? Does all shells that have local, or typeset or similar, deal with it the same way? – Kusalananda Mar 21 '21 at 12:16
  • 2
  • @PeterCordes Yes, my position is that the only time one should ever write a bash-specific script is to extend the bash interactive environment (and similarly for e.g. zsh, fish). – zwol Mar 21 '21 at 14:11
  • @Kusalananda Bourne shell is a terrible language, period. – zwol Mar 21 '21 at 14:11
4

In sufficiently old shells, unset either doesn't exist at all or is unusably buggy (comments in Autoconf's source code say that unset IFS may crash the process). Kusalananda's answer cannot be used with such shells.

If you have to worry about shells this old, your best bet is to set IFS to a space, a tab, and a newline, in that order, as early as possible:

# There is a hard tab between the second pair of single quotes.
IFS=' ''    ''
'

This setting has the same effect as an unset IFS, but it can be safely saved and restored with the second construct from the question:

saved_IFS="$IFS"; IFS='my value here'
my commands here
IFS="$saved_IFS"

(Double-quoting the right hand side of variable=$othervariable is technically not necessary, but it makes life easier for everyone who might have to read your shell script in the future if you don't make them remember that.)

zwol
  • 7,177
  • +1 Simple, shell agnostic and double-quotes the variable expansions (which the question and other answers failed to do). I suggest you add an explanation about that last point. – Brian Drake Mar 21 '21 at 03:50
  • @BrianDrake, note that foo=$bar is one of the cases where double-quoting is not necessary. (bar e.g. some earlier buggy cases with Certain Shells.) – ilkkachu Mar 21 '21 at 08:43
  • Can you mention a shell that does not have unset? Would this be a POSIX shell? – Kusalananda Mar 21 '21 at 12:18
  • 3
    @Kusalananda POSIX does require unset. The problem is, /bin/sh on several of the most popular surviving proprietary Unixes isn't POSIX compliant -- its behavior was intentionally frozen without the changes required by Unix95. And since /bin/sh is the only shell that is guaranteed to exist, and the one run by system and similar... – zwol Mar 21 '21 at 14:06
  • @BrianDrake I am not sure what you meant by "add an explanation about that last point" but I've done my best to address what I think you are asking for. – zwol Mar 21 '21 at 14:08
  • You're talking about Solaris 10? – Kusalananda Mar 21 '21 at 14:14
  • @Kusalananda Among others. – zwol Mar 21 '21 at 20:45
  • 1
    @zwol Sorry, but I'm intrigued. What other current popular commercial Unix contains an original Bourne shell? The Korn shell playing the role of sh on AIX has no issue with its unset AFAIK. Only the old SunOS sh on Solaris is documented to not be able to unset IFS (or PATH, or MAILCHECK or the prompt variables). macOS sh is bash, so there should be no issue there. – Kusalananda Mar 21 '21 at 20:52
  • @zwol At the time, I thought that the quoting was necessary. But now I see that word splitting is not performed in assignments. – Brian Drake Mar 22 '21 at 11:56
  • @Kusalananda I may be out of date, but I thought AIX still relegated the Unix95-compliant shell environment to a non-default path, like Solaris does with /usr/xpg4. And I'm not sure HP-UX or Digital Unix qualifiy as "current" -- it's no longer my job to know if anyone supports those anymore -- but they definitely did the same thing last I used them (which was about seven years ago now). – zwol Mar 22 '21 at 15:28
  • I based my AIX comment on the sh(1) manual from AIX 7.1. Tru64 reached end of life in 2012. – Kusalananda Mar 22 '21 at 15:33
0

In Bash, I'd do it this way:

[ -v IFS ] && oldIFS="$IFS" || unset oldIFS

IFS=something some commands

[ -v oldIFS ] && IFS="$oldIFS" || unset IFS

or this way:

[ "${IFS+set}" ] && oldIFS="$IFS" || unset oldIFS

IFS=something some commands

[ "${oldIFS+set}" ] && IFS="$oldIFS" || unset IFS

Pourko
  • 1,844
  • Did you mean [[ instead of [? Your answer mentions Bash and according to the manpage [(1) on my system, there is no -v test. – Brian Drake Mar 21 '21 at 04:22
  • @Brian Drake: Did you try it? I never had a reason to look into [[. – Pourko Mar 21 '21 at 04:26
  • From reading the bash manpage more carefully, it turns out that it has its own version of [, which supports the same tests as [[. I do not understand why there are two forms, nor how POSIX-compatible either of them is. – Brian Drake Mar 21 '21 at 04:39
  • Anyway, both [ and [[ work in bash --posix. Perhaps my new question should be: Why did you mention Bash at all? The question asked for a shell-agnostic answer. – Brian Drake Mar 21 '21 at 04:49
  • @BrianDrake: [[ is part of the shell grammar so it can affect how var expansion works inside it, making it more robust for some cases with vars whole value is -n for example, and allowing other features; I forget all the details of why it's better. [ is "just" a command. – Peter Cordes Mar 21 '21 at 08:19
  • 3
    @BrianDrake Running bash in POSIX mode does not disable all non-POSIX features. The fact that [[ works in POSIX mode in bash does not mean that [[ is a POSIX feature. The fact that it's not mentioned in the POSIX standard (other than "causing unspecified result") means it's not a POSIX feature. It's allowed to be interpreted (in an unspecified way) by a shell running in POSIX mode. – Kusalananda Mar 21 '21 at 08:32
  • @BrianDrake the man page probably describes the external version of [. It probably doesn't have -v because it would not be nearly as useful in an external program than in a shell builtin. In Bash, use help test. Also see e.g. How do I know if the man page I'm looking at is the correct one? – ilkkachu Mar 21 '21 at 08:38
0

copy

The initial goal is to copy a variable (a) to another (b).
Doing a simple b=$a works if a is set (either a "" or a value), but if a is unset, b needs to be unset as well. If not, b will be set to "".

An unset IFS works differently than a null IFS (in bash):

                             $' \t\n'      unset         null("")
Split Expansions             default       default       no splitting
join arguments with "$*"     "$1c$2c..."   "$1 $2 ..."   "$1$2"

So, we need two steps, copy the value and unset the copied variable (if needed). A variable copy from a to b could be done in several ways:

if [ -n "${a+set}" ]; then unset b; else b="$a"; fi
[ -n "${a+set}" ] && unset b || b="$a"
[ "${a+set}" ] && unset b || b="$a"

${a+'false'} && b=$a || unset b

Then, for IFS, we can copy it to oldIFS, change the value of IFS as needed, and restore it after use:

${IFS+'false'} && oldIFS=$IFS || unset oldIFS

IFS='new value'

${oldIFS+'false'} && IFS=$oldIFS || unset IFS

function(s)

The only way to improve this is to use a function, and yes, a function would be able to copy two vars:

copyIFS    () { ${IFS+'false'} && oldIFS=$IFS || unset oldIFS; }

provided that the names of the variables to modify are known before writing the function as the function must access such variables at the global scope. No local possible, no use of declare/typeset.

It is not possible in sh to create a function for copyvars var1 var2 (with var1 and var2 variable). That would require the use of named vars.

The restore function (using the swapped variable names) is:

restoreIFS () { ${oldIFS+'false'} && IFS=$oldIFS || unset IFS; }

Defining those two functions, we can do:

copyIFS
IFS='a new value'
restoreIFS

Probably simpler, less prone to mistakes.

0

Not an expert, but in zsh you can also use an anonymous function.

myArray=($'\1', $'\1')
printf "before: "
typeset -p IFS
function {

local IFS=$'\0' joinedArray=${(j::)myArray}

printf "during: " typeset -p IFS }

printf "after: " typeset -p IFS

This prints:

before: typeset IFS=$' \t\n\C-@'
during: typeset IFS=$'\C-@'
after: typeset IFS=$' \t\n\C-@'

So the value of IFS is restored. I'm guessing this is probably more lightweight than a subshell.

xdhmoore
  • 145