1

I am using bash with jq to parse data returned from https://ipinfo.io/json into an associative array. I found a nice example that almost does the job @ https://gist.github.com/awesome/b3f65084c70264e87be3e72ee8abd0e5

Whilst that code is able to parse most of the data, it fails when values contain multi-word strings. I suspect the issue has to do with putting quotes in the right place, but I don't know where. I've looked at the jq documentation, and have a general idea, but the details have me stumped. I'm having a little difficulty understanding the interactions between the jq pipes, templates and the reductions. (This is the first time I'm using jq, though I'm pretty solid on regex.)

My version of the code is:

locationResult=$(curl -s 'https://ipinfo.io/json')
arrayAsString=$(echo "$locationResult" | jq --raw-output '. | to_entries | map("[\(.key)]=\(.value)") | reduce .[] as $item ("associativeArray=("; . + $item + " ") + ")"')
declare -A "$arrayAsString"
echo ${associativeArray[org]}

For my location, org returns a company name that is multi-worded, and this causes declare -A "$arrayAsString" to generate a warnings/errors, and echo ${associativeArray[org]} to produce only the first word for the org field.

I have tried quoting the jq result as per the Assigning jq output to bash array when json value contains spaces question but that didn't work.

Any help here would be appreciated.

gone
  • 217

4 Answers4

2

You could always do:

typeset -A ipinfo
while IFS= read -rd '' key && IFS= read -rd '' value; do
  ipinfo[$key]=$value
done < <(
  set -o pipefail
  curl -s https://ipinfo.io/json |
    jq -j '
      to_entries[] |
      [.key, .value | tostring] |
      map(gsub("\u0000"; "") + "\u0000") |
      add'
)

wait "$!" || exit # if curl or jq failed. Needs bash 4.4 or newer.

That is, get jq to output the keys and values (converted to string and with NUL characters (which bash can't store in its variables) removed) NUL delimited, so they can reliably and safely be retrieved with IFS= read -rd ''.

That would allow arbitrary keys and values except those containing NUL characters, and empty keys (as a unfortunate limitation of bash's associative arrays). The day ipinfo.io adds an element with an empty key, that script will break, so you may want to explicitly exclude the members with empty keys. Also note that we convert values to strings as bash (contrary to ksh93) doesn't have support for complex / recursive data structures.

With zsh instead of bash:

typeset -A ipinfo
ipinfo=(
  ${(0)"$(
    set -o pipefail
    curl -s https://ipinfo.io/json |
      jq -j '
        to_entries |
          map([.key, .value | tostring]) |
          flatten |
          map(gsub("\u0000"; "")) |
          join("\u0000")'
  )"}
) || exit

typeset -p ipinfo

zsh does support empty keys and has a proper way to safely assign associative arrays as a whole from the list of keys and values. It does support storing NULs in its variables but since here we're using NULs as delimiters, we still need to remove them from the values.

Ksh93 (the shell bash copied most of its API from) does have support for complex data structures, and the ksh93v- beta release even had experimental support for parsing json into them, but that was still very buggy. That json support was removed in ksh2020 (now abandoned) based on it, and the ksh93 versions still being maintained are based on ksh93u+ instead so don't have it either, so you'd need to implement the parsing by hand. ksh93 also doesn't support storing NULs in its variables either though has some helpers to convert from/to base64 encoding which could be leveraged here.


Note that your approach introduces an arbitrary command execution vulnerability. You don't want to get the shell parser exposed to some random data coming from the internet. Even when using jq's @sh (which is designed to encode values in a format suitable for input as sh code), it's all too easy to overlook issues when data is not of the expected type. Things like eval "$untrusteddata" or typeset / declare ... "$untrusteddata" should ring very loud alarm bells.

Here, I'd use a proper programming language rather than a shell, especially bash.

  • Why do you need wait? I can't notice anything running in the background. – aviro Dec 14 '21 at 08:28
  • 3
    @aviro, the <(...) process substitution runs in the background. Using wait lets you retrieve its exit status and exit the script if curl or jq fails (see also the pipefail option). – Stéphane Chazelas Dec 14 '21 at 10:00
  • it doesn't work, because it would wait for the entire command, and not just for the curl or jq, but since the entire command doesn't run in the background, by the time your loop sub process finishes, the pid won't exist anymore. It would write something such as: line 15: wait: pid 10083 is not a child of this shell – aviro Dec 14 '21 at 12:45
  • 1
    @aviro, no $! contains the pid of the process substitution. However, you need bash 4.4 or newer to be able to wait on it. I've added a note about that. – Stéphane Chazelas Dec 14 '21 at 12:55
  • @StéphaneChazelas: Do you have a reference for some of the syntax you've used; e.g. to_entries[] and [.key, .value? – gone Dec 15 '21 at 02:58
  • @StéphaneChazelas: Thanks. I'm working here to understand this issue better. In the mean time, the NULL key problem can be fixed by inserting del(."") | into the pipeline before to_entries and after . |. – gone Dec 15 '21 at 03:08
  • @gone, yes, or you could add a fixed prefix to the key like . and use "${ipinfo[.ip]}" instead of "${ipinfo[ip]}" (or switch to a better designed shell, or again to a proper programming language). – Stéphane Chazelas Dec 15 '21 at 05:48
1

In your map function in jq, you need to encompass the key and the value in single quotes ('). In your case the space appeared only in the value, but embedding the key in quotes as well will also handle keys that include spaces.

You either need to add '"'"' before and after the key and the value:

arrayAsString=$(echo "$locationResult" | jq --raw-output '. | to_entries | map("['"'"'\(.key)'"'"']='"'"'\(.value)'"'"'") | reduce .[] as $item ("associativeArray=("; . + $item + " ") + ")"')

Or add the dollar sign $ before the first single quote of your jq command, and then embed the key and value in \'.

arrayAsString=$(echo "$locationResult" | jq --raw-output $'. | to_entries | map("[\'\(.key)\']=\'\(.value)\'") | reduce .[] as $item ("associativeArray=("; . + $item + " ") + ")"')

Please note that if either the key of the value include single quotes themselves, this workaround won't work. But as except for that it should be fine.

aviro
  • 5,532
  • That would still not help if the keys or values contained quotes or backticks or $s and would still introduce an arbitrary command execution vulnerability. – Stéphane Chazelas Dec 14 '21 at 11:43
  • @StéphaneChazelas right, I missed that. I updated my answer to use single quotes instead of double quotes. Thanks for the comment. – aviro Dec 14 '21 at 12:26
  • that just moves the problem, then you have problems with key/values that contain single quotes and still an ACE vulnerability. I hint at using @sh in my answer, but even then, you'd need to consider what may happen if some values are not scalar (arrays / objects) or contain non-UTF8 or the locale's encoding is not UTF-8. – Stéphane Chazelas Dec 14 '21 at 12:39
  • @StéphaneChazelas there's a disclaimer about the single quotes issue at the end of my answer about this. Anyway, I wasn't attempting to write a generic solution for ANY type of json, only for the specific example of the OP. – aviro Dec 14 '21 at 12:51
1

I'd like to start with a thank you to @StéphaneChazelas, who answered my question with a solution that works fine, and inspired me to think further outside my box.
The concern about null keys can be addressed by inserting del(."") | into the parsing pipeline just before to_entries[].

Though @StéphaneChazelas's solution worked, I wanted a cleaner solution that did not require loops and threads, and most importantly, only looks at the input once. After a bunch of experimentation, I have been able to produce the following code which achieves the same result, in what I believe to be, a more efficient manner. (This code is influenced by the ideas of both @StéphaneChazelas and github's awesome, the author of the example I linked in my OP.)

parsed=$(echo "$theInput" | jq --raw-output ' . | del(."") | to_entries | map("[\(.key)]=\u0022\(.value)\u0022") | reduce .[] as $item ("locationData=("; . + $item + " ") + ")"')
echo -e "parsed=\n$parsed"
declare -A "$parsed"
echo -e "locationData[org]=\n${locationData[org]}"
echo -e "locationData[city]=\n${locationData[city]}"
echo -e "locationData[\"loc coord\"]=\n${locationData['loc coord']}\n\n"

Where echo "$theInput" can be replaced with the curl call inline. I've taken the approach to feed the curl call into a bash variable before using the variable in the above code. The above code works using the following sample data, which has been 'butchered' to make is as 'bad' as possible.

  1. There is an entry with a null key.
  2. There is a key with no corresponding value.
  3. There is a multi-worded string containing [, ], ' and escaped quotes \".
  4. Contains a key that is comprised of two words separated by a space (as per @aviro suggestion).
  5. The only issue that I've found thus far is that values must not have unescaped quotes in them, which causes a runtime error.
theInput=$(cat <<EOF
{
  "ip": "W.X.Y.Z",
  "": "123.456.789.ABC",
  "city": "Some City",
  "town": "",
  "region": "My State",
  "country": "My Country",
  "loc coord": "0.000,0.0000",
  "org": "Company's [Short] \"Corporation\" Name",
  "postal": "12345",
  "timezone": "Some/Timezone",
  "readme": "https://ipinfo.io/missingauth"
}
EOF
)

Please feel free to poke holes in my assertions and methodology.

gone
  • 217
1

I would probably do it something like this:

declare -A arr
eval "$(
    curl -s https://ipinfo.io/json |
    jq -r 'to_entries[] | @sh "arr[\(.key|tostring)]=\(.value|tostring)"'
)"

This quotes both the key and value for the shell using @sh, and both the key and value are explicitly converted to a string using tostring (taking care of (stringifies) values that are of an unexpected type). The jq expression spits out shell code and the shell evaluates it, creating the arr associative array.

Stéphane Chazelas mentioned alarm bells in his answer, but I can't really find a case where this breaks, assuming the input is valid JSON.

Kusalananda
  • 333,661