58

I know that a custom IFS value can be set for the scope of a single command/built-in. Is there a way to set a custom IFS value for a single statement?? Apparently not, since based on the below the global IFS value is affected when this is attempted

#check environment IFS value, it is space-tab-newline
printf "%s" "$IFS" | od -bc
0000000 040 011 012
             \t  \n
0000003
#invoke built-in with custom IFS
IFS=$'\n' read -r -d '' -a arr <<< "$str"
#environment IFS value remains unchanged as seen below
printf "%s" "$IFS" | od -bc
0000000 040 011 012
             \t  \n
0000003

#now attempt to set IFS for a single statement
IFS=$'\n' a=($str)
#BUT environment IFS value is overwritten as seen below
printf "%s" "$IFS" | od -bc
0000000 012
         \n
     0000001
iruvar
  • 16,725

7 Answers7

55

In some shells (including bash):

IFS=: command eval 'p=($PATH)'

(with bash, you can omit the command if not in sh/POSIX emulation). But beware that when using unquoted variables, you also generally need to set -f, and there's no local scope for that in most shells.

With zsh, you can do:

(){ local IFS=:; p=($=PATH); }

$=PATH is to force word splitting which is not done by default in zsh (globbing upon variable expansion is not done either so you don't need set -f unless in sh emulation).

However, in zsh, you'd rather use $path which is an array tied to $PATH, or to split with arbitrary delimiters: p=(${(s[:])PATH}) or p=("${(s[:]@)PATH}") to preserve empty elements.

(){...} (or function {...}) are called anonymous functions and are typically used to set a local scope. with other shells that support local scope in functions, you could do something similar with:

e() { eval "$@"; }
e 'local IFS=:; p=($PATH)'

To implement a local scope for variables and options in POSIX shells, you can also use the functions provided at https://github.com/stephane-chazelas/misc-scripts/blob/master/locvar.sh. Then you can use it as:

. /path/to/locvar.sh
var=3,2,2
call eval 'locvar IFS; locopt -f; IFS=,; set -- $var; a=$1 b=$2 c=$3'

(by the way, it's invalid to split $PATH that way above except in zsh as in other shells, IFS is field delimiter, not field separator).

IFS=$'\n' a=($str)

Is just two assignments, one after the other just like a=1 b=2.

A note of explanation on var=value cmd:

In:

var=value cmd arg

The shell executes /path/to/cmd in a new process and passes cmd and arg in argv[] and var=value in envp[]. That's not really a variable assignment, but more passing environment variables to the executed command. In the Bourne or Korn shell, with set -k, you can even write it cmd var=value arg.

Now, that doesn't apply to builtins or functions which are not executed. In the Bourne shell, in var=value some-builtin, var ends up being set afterwards, just like with var=value alone. That means for instance that the behaviour of var=value echo foo (which is not useful) varies depending on whether echo is builtin or not.

POSIX and/or ksh changed that in that that Bourne behaviour only happens for a category of builtins called special builtins. eval is a special builtin, read is not. For non special builtin, var=value builtin sets var only for the execution of the builtin which makes it behave similarly to when an external command is being run.

The command command can be used to remove the special attribute of those special builtins. What POSIX overlooked though is that for the eval and . builtins, that would mean that shells would have to implement a variable stack (even though it doesn't specify the local or typeset scope limiting commands), because you could do:

a=0; a=1 command eval 'a=2 command eval echo \$a; echo $a'; echo $a

Or even:

a=1 command eval myfunction

with myfunction being a function using or setting $a and potentially calling command eval.

That was really an overlook because ksh (which the spec is mostly based on) didn't implement it (and AT&T ksh and zsh still don't), but nowadays, except those two, most shells implement it. Behaviour varies among shells though in things like:

a=0; a=1 command eval a=2; echo "$a"

though. Using local on shells that support it is a more reliable way to implement local scope.

  • 1
    Weirdly, IFS=: command eval … sets IFS only for the duration of the eval, as mandated by POSIX, in dash, pdksh and bash, but not in ksh 93u. It's unusual to see ksh being the odd-non-compliant-one-out. – Gilles 'SO- stop being evil' Sep 24 '13 at 21:29
19

Standard save-and-restore taken from "The Unix Programming Environment" by Kernighan and Pike:

#!/bin/sh
old_IFS=$IFS
IFS="something_new"
some_program_or_builtin
IFS=${old_IFS}
msw
  • 10,593
  • 4
    thank you and +1. Yes I am aware of this option, but I would like to know if there is a "cleaner" option if you know what i mean – iruvar Sep 24 '13 at 16:46
  • 1
    You could jam it onto one line with semi-colons, but I don't think that's cleaner. It might be nice if everything you wanted to express had special syntactic support, but then we'd probably have to learn carpentry or sumptin instead of coding ;) – msw Sep 24 '13 at 16:49
  • Correct me if I'm wrong, but shouldn't you be able to modify the "invoke builtin with custom IFS" example from your previous statement? What is it about IFS=$'\n' read -r -d '' -a arr <<< "$str" that doesn't do what you want? – The Spooniest Sep 24 '13 at 16:54
  • 12
    That fails to restore $IFS correctly if it was previously unset. – Stéphane Chazelas Sep 24 '13 at 17:17
  • 3
    If it's unset, Bash treats it as $'\t\n'' ', as explained here: http://wiki.bash-hackers.org/syntax/expansion/wordsplit#internal_field_separator_ifs – davide Mar 15 '15 at 01:36
  • 2
    @davide, that would be $' \t\n'. space has to be first as that's used for "$*". Note that it's the same in all Bourne-like shells. – Stéphane Chazelas May 12 '15 at 10:00
15

This snippet from the question:

IFS=$'\n' a=($str)

is interpreted as two separate global variable assignments evaluated from left to right, and is equivalent to:

IFS=$'\n'; a=($str)

or

IFS=$'\n'
a=($str)

This explains both why the global IFS was modified, and why the word-splitting of $str into array elements was performed using the new value of IFS.

You might be tempted to use a subshell to limit the effect of the IFS modification like this:

str="value 0:value 1"
a=( old values )
( # Following code runs in a subshell
 IFS=":"
 a=($str)
 printf 'Subshell IFS: %q\n' "${IFS}"
 echo "Subshell: a[0]='${a[0]}' a[1]='${a[1]}'"
)
printf 'Parent IFS: %q\n' "${IFS}"
echo "Parent: a[0]='${a[0]}' a[1]='${a[1]}'"

but you will quickly notice that the modification of a is also limited to the subshell:

Subshell IFS: :
Subshell: a[0]='value 0' a[1]='value 1'
Parent IFS: $' \t\n'
Parent: a[0]='old' a[1]='values'

Next, you would be tempted to save/restore IFS using the solution from this previous answer by @msw or to try and use a local IFS inside a function as suggested by @helpermethod. But pretty soon, you notice you are in all sorts of trouble, especially if you are a library author who needs to be robust against misbehaving invoking scripts:

  • What if IFS was initially unset?
  • What if we are running with set -u (a.k.a set -o nounset)?
  • What if IFS was made read-only via declare -r IFS?
  • What if I need the save/restore mechanism to work with recursion and or asynchronous execution (such as a trap handler`)?

Please don't save/restore IFS. Instead, stick to temporary modifications:

  • To limit the variable modification to a single command, built-in or function invocation, use IFS="value" command.

    • To read into multiple variables by splitting on a specific character (: used below as example), use:

        IFS=":" read -r var1 var2 <<< "$str"
      
    • To read into an array use (do this instead of array_var=( $str )):

        IFS=":" read -r -a array_var <<< "$str"
      
  • Limit the effects of modifying the variable to a subshell.

    • To output an array's elements separated by comma:

        (IFS=","; echo "${array[*]}")
      
    • To capture that into a string:

        csv="$(IFS=","; echo "${array[*]}")"
      
sls
  • 351
  • 3
  • 5
11

Put your script into a function and invoke that function passing the commandline arguments to it. As IFS is defined local, changes to it don't affect the global IFS.

main() {
  local IFS='/'

  # the rest goes here
}

main "$@"
helpermethod
  • 1,982
7

For this command:

IFS=$'\n' a=($str)

There is an alternative solution: to give the first assignment (IFS=$'\n') a command to execute (a function):

$ split(){ a=( $str ); }
$ IFS=$'\n' split

That will put IFS in the environment to call split, but will not be retained in the present environment.

This also avoids the always risky use of eval.

4

The proposed answer from @helpermethod is certainly an interesting approach. But it's also a bit of a trap because in BASH local variable scope extends from the caller to the called function. Therefore, setting IFS in main(), will result in that value persisting to functions called from main(). Here's an example:

#!/usr/bin/env bash
#
func() {
  # local IFS='\'

  local args=${@}
  echo -n "$FUNCNAME A"
  for ((i=0; i<${#args[@]}; i++)); do
    printf "[%s]: %s" "${i}" "${args[$i]}"
  done
  echo

  local f_args=( $(echo "${args[0]}") )
  echo -n "$FUNCNAME B"
  for ((i=0; i<${#f_args[@]}; i++)); do
    printf "[%s]: %s" "${i}" "${f_args[$i]}  "
  done
  echo
}

main() {
  local IFS='/'

  # the rest goes here
  local args=${@}
  echo -n "$FUNCNAME A"
  for ((i=0; i<${#args[@]}; i++)); do
    printf "[%s]: %s" "${i}" "${args[$i]}"
  done
  echo

  local m_args=( $(echo "${args[0]}") )
  echo -n "$FUNCNAME B"
  for ((i=0; i<${#m_args[@]}; i++)); do
    printf "[%s]: %s" "${i}" "${m_args[$i]}  "
  done
  echo

  func "${m_args[*]}"
}

main "$@"

And the output...

main A[0]: ick/blick/flick
main B[0]: ick  [1]: blick  [2]: flick
func A[0]: ick/blick/flick
func B[0]: ick  [1]: blick  [2]: flick

If IFS declared in main() wasn't still in scope in func(), then the array would not have been properly parsed in func() B. Uncomment the first line in func() and you get this output:

main A[0]: ick/blick/flick
main B[0]: ick  [1]: blick  [2]: flick
func A[0]: ick/blick/flick
func B[0]: ick/blick/flick

Which is what you should get if IFS had gone out of scope.

A far better solution IMHO, is to forego changing or relying on IFS at the global/local level. Instead, spawn a new shell and fiddle with IFS in there. For instance, if you were to call func() in main() as follows, passing the array as a string with a backward slash field separator:

func $(IFS='\'; echo "${m_args[*]}")

...that change to IFS will not be reflected within func(). The array will be passed as a string:

ick\blick\flick

...but inside of func() the IFS will still be "/" (as set in main()) unless changed locally in func().

More information about isolating changes to IFS can be viewed at the following links:

How do I convert a bash array variable to a string delimited with newlines?

Bash string to array with IFS

Hints and Tips for general shell script programing -- See "NOTE the use of sub-shells..."

1

The most straight forward solution is to take a copy of the original $IFS, as in e.g. the answer of msw. However, this solution does not distinguish between an unset IFS and an IFS set equal to the empty string, which is important for many applications. Here is a more general solution which capture this distinction:

# Functions taking care of IFS
set_IFS(){
    if [ -z "${IFS+x}" ]; then
        IFS_ori="__unset__"
    else
        IFS_ori="$IFS"
    fi
    IFS="$1"
}
reset_IFS(){
    if [ "${IFS_ori}" == "__unset__" ]; then
        unset IFS
    else
        IFS="${IFS_ori}"
    fi
}

# Example of use
set_IFS "something_new"
some_program_or_builtin
reset_IFS
jmd_dk
  • 141