5

I made an associative array as follows. To give a few details, the keys refer to specific files because I will be using this array in the context of a larger script (where the directory containing the files will be a getopts argument).

declare -A BAMREADS
echo "BAMREADS array is initialized"

BAMREADS[../data/file1.bam]=33285268
BAMREADS[../data/file2.bam]=28777698
BAMREADS[../data/file3.bam]=22388955

echo ${BAMREADS[@]}  # Output: 22388955 33285268 28777698
echo ${!BAMREADS[@]} # Output: ../data/file1.bam ../data/file2.bam ../data/file3.bam

So far, this array seems to behave as I expect. Now, I want to build another associative array based on this array. To be specific: my second array will have the same keys as my first one but I want to divide the values by a variable called $MIN.

I am not sure which of the following strategies is best and I can't seem to make either work.

Strategy 1: copy the array and modify the array?

MIN=33285268

declare -A BRAMFRACS
echo "BAMFRACS array is initialized"
BAMFRACS=("${BAMREADS[@]}")

echo ${BAMFRACS[@]}  # Output: 22388955 33285268 28777698
echo ${!BAMFRACS[@]} # Output: 0 1 2

This is not what I want for the keys. Even if it works, I would then need to perform the operation I mentioned on all the values.

Stragegy 2: build the second array when looping through the first.

MIN=33285268

declare -A BRAMFRACS
echo "BAMFRACS array is initialized"

for i in $(ls $BAMFILES/*bam)
do
    echo $i
    echo ${BAMREADS[$i]}
    BAMFRACS[$i] = ${BAMREADS[$i]} 
done

echo ${BAMFRACS[@]}
echo ${!BAMFRACS[@]}


#When I run this, I get the following error which I am unsure how to solve:

../data/file1.bam
33285268
script.bash: line 108: BAMFRACS[../data/file1.bam]: No such file or directory
../data/file2.bam
28777698
script.bash: line 108: BAMFRACS[../data/file2.bam]: No such file or directory
../data/file3.bam
22388955
script.bash: line 108: BAMFRACS[../data/file3.bam]: No such file or directory

Thanks

mf94
  • 219

4 Answers4

9

To answer the more general question about copying associative arrays.

The bash maintainers made the unfortunate decision to copy the ksh93 API rather than the zsh one when they introduced their own associative arrays in 4.0.

ksh93/bash do support setting an associative array as a whole, but it's with the:

hash=([k1]=v1 [k2]=v2)

syntax. While with zsh, it's

hash=(k1 v1 k2 v2)

(support for the ([k]=v...) ksh93 syntax was also added later on for compatibility).

What that means though is that with ksh93 and bash, it's very tricky to create a hash that way from an arbitrary list of keys and values.

With the zsh syntax, you just need to pass the list as alternating keys and values. For instance, to copy two associative arrays:

h2=("${(@kv)h1}")

Or from a CSV with two columns:

IFS=$'\n,'; h=($(<file.csv))

Or from arrays of key and values:

h=("${(@)keys:^values}")

With the ksh93/bash syntax, while there's "${!h[@]}" and "${h[@]}" to expand to the list of keys and values (like "${(@k)h}" and "${(@v)h}" in zsh), there's no operator to expand to both keys and values in the [key]=value syntax expected by h=(...) (the "${(@kv)h}" in zsh).

A trick you can use in those shells to copy associative arrays though (other than copying elements in a loop), is to use the output of typeset -p.

For instance, the equivalent of zsh's h2=("${(@kv)h1}") to copy h1 into h2 could be done in ksh93 or bash with:

h1_definition=$(typeset -p h1) &&
  eval "typeset -A h2=${h1_definition#*=}"

Which with bash you can shorten to:

h1_definition=$(typeset -p h1) &&
  typeset -A h2="${h1_definition#*=}"

(While like in ksh93, typeset -A h=value is short for typeset -A h=([0]=value) in bash, if value starts with ( and ends with ), then the content is interpreted as a compound associative assignment as if passed to eval (even if the ( are quoted or the result of some expansion)).

In the end, it's about as easy to use the loop instead:

for k in "${!h1[@]}"; do h2[$k]=${h1[$k]}; done
  • which with bash (≥4.4) you can shorten to `typeset -A h2="${h1_definition#=}". I was confused why this wasn't working for me on CentOS 7 (Bash 4.2.x) so I tried 3.x through 5.x by [pulling images from Docker](https://gist.github.com/ernstki/b782cc7f2a29ec01c1f4355f2dd312cc), and with earlier Bashes you'll either get an error or just a literal string assigned to[0]` instead (but no error). – Kevin E Mar 14 '21 at 05:18
  • Any particular reason why using typeset instead of declare? Judging from help typeset it seems declare is the "real" command. – MestreLion Sep 10 '21 at 14:52
  • 1
    @MestreLion, typeset is the name ksh chose nearly 40 years ago and is supported by all Bourne-like shells that have variable types including bash, so that's the one I'm used to. bash decided to call it declare for some reason, but has always supported typeset as an alias and will likely support it forever, like other shells. There have been more shells recently adding declare as an alias for typeset for compatibility with bash, but it will take more than that to change my habits :-) – Stéphane Chazelas Sep 10 '21 at 14:57
8

Build the new array from the old:

MIN=33285268

declare -A BRAMFRACS
for key in "${!BAMREADS[@]}"; do
    BRAMFRACS[$key]=$(( ${BAMREADS[$key]} / MIN ))
done

Comments on your code:

  • Your first suggested code does not work as it copies the values from the associative array to the new array. The values automatically gets the keys 0, 1 and 2 but the original keys are not copied. You need to copy the array key by key as I have shown above. This way you assign the wanted value to the correct key.

  • Your second suggested code contains a syntax error in that it has spaces around = in an assignment. This is where the errors that you see come from. variable = value is interpreted as "the command variable executed with the operands = and value".

  • If you wish to iterate over a set of pathnames, don't use ls. Instead just do for pathname in "$BAMFILES"/*bam; do.

  • Quote you variable expansions.

  • Consider using printf instead of echo to output variable data.

Related:

Kusalananda
  • 333,661
0

The following bash assigns the values from associative array AA2 (which may be unset) into another associative array AA1 (much must be declared -A).

LIST="$(declare -p AA2 2>/dev/null)"
[[ "$LIST" ]] && AA1+=${LIST#*=}
  • declare -p echoes the value of a variable as a declare statement which can be directly passed into the interpreter without any word splitting problems.
  • ${LIST#*=} removes the equals sign and everything before it.
  • The hiding of an error (2>/dev/null) during declare and the non-zero length test ([[ "$LIST" ]]) allow AA2 to be unset.
  • Unexpected results (not errors) will happen if AA1 or AA2 are not associative arrays.
Paul
  • 210
-3

this should do it (can also add additional key-value):

declare -A origDict=( [keya]=value_a [keyb]=value_b [keyc]=value_c )
declare -a newDict=( echo ${origDict[*]} [keynew]=new_value )
u28ds02
  • 1
  • 1
  • Mostly works, but you lose the original keys, as newDict is declared as just an ordinary (numeric-indexed) array in this case. If you run declare -p newDict afterward, you will see: declare -a newDict='([0]="new_value" [1]="value_c" [2]="value_b" [3]="value_a")'. Thanks for your answer, though! – Kevin E Mar 14 '21 at 00:25
  • if newDict is an ordinary array instead of an associative one, then this fails to answer the original question – MestreLion Sep 10 '21 at 14:54