9

I am trying to pass a "var name" to a function, have the function transform the value the variable with such "var name" contains and then be able to reference the transformed object by its original "var name".

For example, let's say I have a function that converts a delimited list into an array and I have a delimited list named 'animal_list'. I want to convert that list to an array by passing the list name into the function and then reference, the now array, as 'animal_list'.

Code Example:

function delim_to_array() {
  local list=$1
  local delim=$2
  local oifs=$IFS;

  IFS="$delim";
  temp_array=($list);
  IFS=$oifs;

  # Now I have the list converted to an array but it's 
  # named temp_array. I want to reference it by its 
  # original name.
}

# ----------------------------------------------------

animal_list="anaconda, bison, cougar, dingo"
delim_to_array ${animal_list} ","

# After this point I want to be able to deal with animal_name as an array.
for animal in "${animal_list[@]}"; do 
  echo "NAME: $animal"
done

# And reuse this in several places to converted lists to arrays
people_list="alvin|baron|caleb|doug"
delim_to_array ${people_list} "|"

# Now I want to treat animal_name as an array
for person in "${people_list[@]}"; do 
  echo "NAME: $person"
done
  • So you want call-by-reference in bash, where you can pass a variable name for the function to put the result, instead of printing it to stdout like the usual calling convention. – Peter Cordes Aug 21 '15 at 16:18
  • exactly, yes, thanks. Call by Reference is the feature I should have referenced in my question – mackdoyle Aug 22 '15 at 20:06

4 Answers4

10

Description

Understanding this will take some effort. Be patient. The solution will work correctly in bash. Some "bashims" are needed.

First: We need to use the "Indirect" access to a variable ${!variable}. If $variable contains the string animal_name, the "Parameter Expansion": ${!variable} will expand to the contents of $animal_name.

Lets see that idea in action, I have retained the names and values you used where possible to make it easier for you to understand:

#!/bin/bash

function delim_to_array() {
    local VarName=$1

    local IFS="$2";
    printf "inside  IFS=<%s>\n" "$IFS"

    echo "inside  var    $VarName"
    echo "inside  list = ${!VarName}"

    echo a\=\(${!VarName}\)
    eval a\=\(${!VarName}\)
    printf "in  <%s> " "${a[@]}"; echo

    eval $VarName\=\(${!VarName}\)
}

animal_list="anaconda, bison, cougar, dingo"
delim_to_array "animal_list" ","

printf "out <%s> " "${animal_list[@]}"; echo
printf "outside IFS=<%s>\n" "$IFS"

# Now we can use animal_name as an array
for animal in "${animal_list[@]}"; do
    echo "NAME: $animal"
done

If that complete script is executed (Let's assume its named so-setvar.sh), you should see:

$ ./so-setvar.sh
inside  IFS=<,>
inside  var    animal_list
inside  list = anaconda, bison, cougar, dingo
a=(anaconda  bison  cougar  dingo)
in  <anaconda> in  <bison> in  <cougar> in  <dingo> 
out <anaconda> out <bison> out <cougar> out <dingo> 
outside IFS=< 
>
NAME: anaconda
NAME: bison
NAME: cougar
NAME: dingo

Understand that "inside" means "inside the function", and "outside" the opposite.

The value inside $VarName is the name of the var: animal_list, as a string.

The value of ${!VarName} is show to be the list: anaconda, bison, cougar, dingo

Now, to show how the solution is constructed, there is a line with echo:

echo a\=\(${!VarName}\)

which shows what the following line with eval executes:

a=(anaconda  bison  cougar  dingo)

Once that is evaluated, the variable a is an array with the animal list. In this instance, the var a is used to show exactly how the eval affects it.

And then, the values of each element of a are printed as <in> val.
And the same is executed in the outside part of the function as <out> val
That is shown in this two lines:

in  <anaconda> in  <bison> in  <cougar> in  <dingo>
out <anaconda> out <bison> out <cougar> out <dingo>

Note that the real change was executed in the last eval of the function.
That's it, done. The var now has an array of values.

In fact, the core of the function is one line: eval $VarName\=\(${!VarName}\)

Also, the value of IFS is set as local to the function which makes it return to the value it had before executing the function without any additional work. Thanks to Peter Cordes for the comment on the original idea.

That ends the explanation, hope its clear.


Real Function

If we remove all the unneeded lines to leave only the core eval, only create a new variable for IFS, we reduce the function to its minimal expression:

delim_to_array() {
    local IFS="${2:-$' :|'}"
    eval $1\=\(${!1}\);
}

Setting the value of IFS as a local variable, allows us to also set a "default" value for the function. Whenever the value needed for IFS is not sent to the function as the second argument, the local IFS takes the "default" value. I felt that the default should be space ( ) (which is always an useful splitting value), the colon (:), and the vertical line (|). Any of those three will split the values. Of course, the default could be set to any other values that fit your needs.

Edit to use read:

To reduce the risk of unquoted values in eval, we can use:

delim_to_array() {
    local IFS="${2:-$' :|'}"
    # eval $1\=\(${!1}\);
    read -ra "$1" <<<"${!1}"
}

test="fail-test"; a="fail-test"

animal_list='bison, a space, {1..3},~/,${a},$a,$((2+2)),$(echo "fail"),./*,*,*'

delim_to_array "animal_list" ","
printf "<%s>" "${animal_list[@]}"; echo

$ so-setvar.sh
<bison>< a space>< {1..3}><~/><${a}><$a><$((2+2))><$(echo "fail")><./*><*><*>

Most of the values set above for the var animal_list do fail with eval.
But pass the read without problems.

  • Note: It is perfectly safe to try the eval option in this code as the values of the vars have been set to plain text values just before calling the function. Even if really executed, they are just text. Not even a problem with ill-named files, as pathname expansion is the last expansion, there will be no variable expansion re-executed over the pathname expansion. Again, with the code as is, this is in no way a validation for general use of eval.

Example

To really understand what, and how this function works, I re-wrote the code you posted using this function:

#!/bin/bash

delim_to_array() {
        local IFS="${2:-$' :|'}"
        # printf "inside  IFS=<%s>\n" "$IFS"
        # eval $1\=\(${!1}\);
        read -ra "$1" <<<"${!1}";
}

animal_list="anaconda, bison, cougar, dingo"
delim_to_array "animal_list" ","
printf "NAME: %s\t " "${animal_list[@]}"; echo

people_list="alvin|baron|caleb|doug"
delim_to_array "people_list"
printf "NAME: %s\t " "${people_list[@]}"; echo

$ ./so-setvar.sh
NAME: anaconda   NAME:  bison    NAME:  cougar   NAME:  dingo    
NAME: alvin      NAME: baron     NAME: caleb     NAME: doug      

As you can see, the IFS is set only inside the function, it is not changed permanently, and therefore it does not need to be re-set to its old value. Additionally, the second call "people_list" to the function takes advantage of the default value of IFS, there is no need to set a second argument.


« Here be Dragons » ¯\_(ツ)_/¯


Warnings 01:

As the (eval) function was constructed, there is one place in which the var is exposed unquoted to the shell parsing. That allows us to get the "word splitting" done using the IFS value. But that also expose the values of the vars (unless some quoting prevent that) to: "brace expansion", "tilde expansion", "parameter, variable and arithmetic expansion", "command substitution", and "pathname expansion", In that order. And process substitution <() >() in systems that support it.

An example of each (except last) is contained in this simple echo (be careful):

 a=failed; echo {1..3} ~/ ${a} $a $((2+2)) $(ls) ./*

That is, any string that starts with {~$`<> or could match a file name, or contains ?*[] is a potential problem.

If you are sure that the variables do not contain such problematic values, then you are safe. If there is the potential to have such values, the ways to answer your question are more complex and need more (even longer) descriptions and explanations. Using read is an alternative.

Warnings 02:

Yes, read comes with it's own share of «dragons».

  • Always use the -r option, it is very hard for me to think of a condition where it is not needed.
  • The read command could get only one line. Multi-lines, even by setting the -d option, need special care. Or the whole input will be assigned to one variable.
  • If IFS value contains an space, leading and trailing spaces will be removed. Well, the complete description should include some detail about the tab, but I'll skip it.
  • Do not pipe | data to read. If you do, read will be in a sub-shell. All variables set in a sub-shell do not persist upon returning to the parent shell. Well, there are some workarounds, but, again, I'll skip the detail.

I didn't mean to include the warnings and problems of read, but by popular request, I had to include them, sorry.

8

The Bash FAQ has a whole entry about calling by reference / indirection.

In the simple case, a better alternative to the eval suggested by other answers, that makes the quoting much easier.

func() {  # set the caller's simple non-array variable
    local retvar=$1
    printf -v "$retvar"  '%s ' "${@:2}"  # concat all the remaining args
}

Bash-completion (the code that runs when you hit tab) has switched over to printf -v instead of eval for its internal functions, because it's more readable and probably faster.

For returning arrays, the Bash FAQ suggests using read -a to read into sequential array indices of an array variable:

# Bash
aref=realarray
IFS=' ' read -d '' -ra "$aref" <<<'words go into array elements'

Bash 4.3 introduced a feature that makes call-by-reference massively more convenient. Bash 4.3 is still new-ish (2014).

func () { # return an array in a var named by the caller
    typeset -n ref1=$1   # ref1 is a nameref variable.
    shift   # remove the var name from the positional parameters
    echo "${!ref1} = $ref1"  # prints the name and contents of the real variable
    ref1=( "foo" "bar" "$@" )  # sets the caller's variable.
}

Note that the wording of the bash man page is slightly confusing. It says the -n attribute can't be applied to array variables. This means you can't have an array of references, but you can have a reference to an array.

Peter Cordes
  • 6,466
  • @BinaryZebra: Yeah, typeset -n lets you leave out the eval, and just use any normal way of setting array elements, which makes it a LOT easier to see that you got the quoting right, and aren't interpreting data as code. (e.g. if there was a $(rm -rf /*) (or accidental non-malicious shell meta-characters) in something that you accidentally let the shell eval). I didn't want to go into details about splitting in my answer; your answer covered that nicely, and I upvoted it for that. Like you said, here be dragons :P. The part of this question I found interesting was the indirection. – Peter Cordes Aug 21 '15 at 23:52
4

You cannot change the variable (or array in this case) inside the function because you pass only its content - function doesn't know which variable has been passed.

As a workaround you can pass the name of the variable and inside the functionevaluate it to get the content.

#!/bin/bash 

function delim_to_array() {
  local list=$1
  local delim=$2
  local oifs=$IFS;

  IFS="$delim"
  temp_array=($(eval echo '"${'"$list"'}"'))
  IFS=$oifs;

  eval "$list=("${temp_array[@]}")"            
}                                             

animal_list="anaconda, bison, cougar, dingo"
delim_to_array "animal_list" ","
printf "NAME: %s\n" "${animal_list[@]}"

people_list="alvin|baron|caleb|doug"
delim_to_array "people_list" "|"
printf "NAME: %s\n" "${people_list[@]}"

Pay close attention to the quotes in the lines where eval is used. Part of the expression needs to be in single quotes, other part in double quotes. Additionally I've replaced the for loop to the simpler printf command in the final printing.

Output:

NAME: anaconda
NAME: bison
NAME: cougar
NAME: dingo
NAME: alvin
NAME: baron
NAME: caleb
NAME: doug
jimmij
  • 47,140
  • Are you sure you need an eval to IFS-split a string into an array? And if you do, couldn't you combine that eval with the one for doing indirection? (Also note that you can avoid this messy quoting with read -a, see my answer.) Also, local IFS=$2 would avoid the save/restore. – Peter Cordes Aug 21 '15 at 21:53
0
function delim_to_array() {
  local list=$1
  local delim=$2
  local oifs=$IFS;

  IFS="$delim";
  temp_array=($list);
  IFS=$oifs;
}

So I think you're skipping on a very simple detail with this function: it's always easier if the callee performs only the repetitive processing and the caller calls the shots. In that function you've got the callee doing all the calling - it shouldn't have to handle those names in that way.

isName()
    case   "${1##[0-9]*}"   in
    (${IFS:+*}|*[!_[:alnum:]]*)
    IFS= "${IFS:+isName}" "$1"|| ! :
    esac  2>/dev/null

setSplit(){
   isName "$1" ||
   set "" "setSplit(): bad name: '$1'"
   eval   "shift; set -f
           ${1:?"$2"}=(\$*)
           set +f -$-"
}

That safely validates the array name, produces meaningful error output on stderr and halts quits as appropriate when called with invalid arguments. It's error output looks like:

bash: 1: setSplit(): bad name: 'arr@yname'

...where bash is the shell's current $0 and arr@yname was setSplit()'s first argument when I called it and it wrote that message.

It's also two functions - and so the caller can dynamically redefine the test for isName() at its discretion without any modifications to the setSplit() function at all.

It also safely disables shell filename generation globs to prevent their inadvertent expansion while splitting - as might otherwise happen by default if any arguments contained any of the chars [*?. Before returning it restores any shell options it might have changed in doing so to the state in which it found them - by which I mean you can call it with shell filename globbing enabled or disabled and it will not affect that setting either way beyond its return.

There's a key thing missing here, though - the $IFS is not configured. The isName() function implements a workaround for the rather alarming bash bug of applying $IFS to the contents of POSIX bracket expressions in case patterns (seriously: what the hell?) with a singly self-recursive call to nullify its local $IFS when the global value isn't already before returning. But that's entirely orthogonal to the array splitting, and otherwise setSplit() does nothing w/ $IFS. And that's just as it should be. You don't need to do it like that.

The caller should set that:

IFS=aBc setSplit arrayname 'xyzam*oBabc' x y z
printf '<%q>\n' "$IFS" "${arrayname[@]}"

<$' \t\n'>
<xyz>
<m\*o>
<''>
<b>
<''>
<x>
<y>
<z>

The above works in a bash shell by setting the $IFS value local to the function called.

POSIXly:

IFS=aBc command eval "setSplit arrayname 'xyzam*oBabc' x y z"

...would serve the same purpose. The difference lies in bash's breaking with the standard with regards to perpetuating environment for special builtins and functions, which otherwise specifies that variables set on their command-lines should affect the current shell environment (which might be preferred in that you can therefore have it either way).

Whatever your preference, the point is that the caller calls the shots here, and the callee just shoots.

mikeserv
  • 58,310