Can a string be used as index in array of arrays in gawk?

Question

Let's look at this file:

9=foo 3=bar 84=baz 30=bin 71=bon
9=goo 3=gar 84=gaz 30=gin 71=gon
9=soo 3=sar 84=saz 30=sin 71=son

Running this gawk line:

gawk '
{
    split($0,arr)
    for(i=1;i<=length(arr);i++){
        eq=index(arr[i],"=")
        num=substr(arr[i],eq+1)
        val=substr(arr[i],0,eq-1)
        printf "%s=%s ", num,val
        arr2[i][num] = val
    }
    printf ORS
}
END{
    print "---\n",arr2[2][9]}
' newfile.txt

What I expect to get is goo because the first index of the array is the second line, and the second index is the number before the = sign.

examples:

arr2[1][3] = bar
arr2[1][71] = bon
arr[3][30] = sin

so on..

Can anyone tell me why it's not working and if it's even possible?

gawk version GNU Awk 4.1.1, API: 1.1

Thanks.

Why not gaz instead of goo, because it gets overwritten in the same line? Anyhow, what the actual output? — Philippos, Mar 07 '17 at 11:32
awk is processing each line in the main body, but BEGIN is before line by line processing, and END is after all line by line processing. Your for(i in arr) is processing the array of each line and overwriting in arr2, then the END statement is going to be processing the last, overwritten values of the last line processed — bsd, Mar 07 '17 at 11:35
Please [edit] your question and i) explain what your script is supposed to be doing and ii) explain what it is actually doing. — terdon, Mar 07 '17 at 12:00

score 0 · Accepted Answer · answered Mar 07 '17 at 12:20

Yes, it is possible. The problem in your script, however is that you're not doing what (I think) you think you're doing. First of all, you are using i as the index of the first level array:

arr2[i][num] = val

This means that i will be a number from 1 to the length of the array and that arr2[i] will be overwritten if any of your lines has the same string value in the same field.

Now, the reason you're getting a blank line as output (I assume that's what you're getting, you haven't actually said) is that you are using the wrong order in your array. You have:

arr2[i][num] = val

So, for example:

arr2[1][soo]=9

What you seem to be expecting is the other way around:

arr2[1][9]=soo

So, what you need is:

arr2[i][value]=num

If we also change the array definition to use NR as a primary index instead of num to avoid collisions, we get:

gawk '
{
    split($0,arr)
    for(i=1;i<=length(arr);i++){
        eq=index(arr[i],"=")
        num=substr(arr[i],eq+1)
        val=substr(arr[i],0,eq-1)
        arr2[NR][val] = num
    }
}
END{
  for(i in arr2){
    for (num in arr2[i]){
      printf "arr2[%s][%s]=%s\n", i, num, arr2[i][num]
    }
  }
}
' newfile.txt
arr2[1][3]=bar
arr2[1][9]=foo
arr2[1][30]=bin
arr2[1][71]=bon
arr2[1][84]=baz
arr2[2][3]=gar
arr2[2][9]=goo
arr2[2][30]=gin
arr2[2][71]=gon
arr2[2][84]=gaz
arr2[3][3]=sar
arr2[3][9]=soo
arr2[3][30]=sin
arr2[3][71]=son
arr2[3][84]=saz

As you can see, arr2[2][9] is now goo as expected. The whole thing is a bit too complex though. You could simplify it to:

$ awk -F'[ =]' '{
                    for(i=1;i<=NF;i+=2){
                        arr2[NR][$(i)]=$(i+1);
                    }
                } END{print  arr2[2][9]}' newfile.txt 
goo

That's great thanks, don't know why I had i instead of NR. — Moshe, Mar 07 '17 at 12:28
@Moshe that was only a problem with your first example, before the edit, where you had repeated values. Your main problem was using the num as a value when you thought you were using it as a key. — terdon, Mar 07 '17 at 12:30

Can a string be used as index in array of arrays in gawk?

1 Answers1