0

I have a file with all type of brackets {}[]()- nested, open and close appropriately. I would like to return the content within the matching square brackets after the string (text:). The content of the file looks like this:

....

{ "text": [ { "string1": ["hello", "world"], "string2": ["foo", "bar"] }, { "string1": ["alpha", "beta"], "string2": ["cat", "dog"] } ], "unwanted": [ { "stuff": ["nonesense"] } ] } .... and so on

I would like to return

{
    "string1": ["hello", "world"],
    "string2": ["foo", "bar"]
},
{
    "string1": ["alpha", "beta"],
    "string2": ["cat", "dog"]
}

The file is json type and has similar structure throughout. I would like to return contents in the square brackets after text: specifically.

SKPS
  • 13
  • That's not valid JSON – Chris Davies May 13 '22 at 16:22
  • 2
    You can't parse nested parenthesis with regular expressions (they're not a "regular" language in the technical sense). Then again, Perl supports regexes that aren't limited to being "regular", so you could hack it up with a recent-enough Perl. Or, if it's enough to do that, ignore the brackets and nesting, and just look for the lines that say text: [ and ]. Then again, if it's actually proper JSON, and the example is just off, then you should use a JSON parser. But that's something only you know, and you'll have to decide if heuristics are ok, or if you need an exact parse. – ilkkachu May 13 '22 at 16:45
  • @ilkkachu Thanks. – SKPS May 13 '22 at 16:54

1 Answers1

2

What you've offered isn't valid JSON. Bracketing the expression, fixing up the other errors, and adding a counter-example:

{
    "text": [
        {
            "string1": ["hello", "world"],
            "string2": ["foo", "bar"]
        },
        {
            "string1": ["alpha", "beta"],
            "string2": ["cat", "dog"]
        }
    ],
    "unwanted": [
        {
            "stuff": ["nonesense"]
        }
    ]
}

You can parse this with a JSON parser such as jq. For example, this will pick out the text arrays:

jq -c '.text[]'

{"string1":["hello","world"],"string2":["foo","bar"]} {"string1":["alpha","beta"],"string2":["cat","dog"]}

Or

jq '.text[]'

{ "string1": [ "hello", "world" ], "string2": [ "foo", "bar" ] } { "string1": [ "alpha", "beta" ], "string2": [ "cat", "dog" ] }

These are syntactically identical; just laid out slightly differently.

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • Thanks. Is there a way to achieve the same without jq? Like a conventional string match method? – SKPS May 13 '22 at 16:48
  • 1
    @SKPS this is not a conventional string and you are not asking for a string match, so that really isn't the right approach. For example, what would happen if string2 contained "he[llo"? That's perfectly valid, but it would break any naive approach to parsing the data. You could write a little script that keeps track of the open and closed brackets, but is it really worth it? – terdon May 13 '22 at 16:56
  • @terdon Thanks. I realize it would be more complicated than I think. I will proceed using jq. – SKPS May 13 '22 at 17:07