Assuming that your JSON document is well formed, like the following modified variant of the text that you present:
{
"data": [
{
"name": "name1",
"date": [
{
"date1": "aaa",
"date2": "bbb"
},
{
"date1": "ccc",
"date2": "ddd"
},
{
"date1": "eee",
"date2": "fff"
}
],
"var": "ggg"
},
{
"name": "name2",
"date": [
{
"date1": "hhh",
"date2": "iii"
},
{
"date1": "jjj",
"date2": "kkk"
}
],
"var": "lll"
}
]
}
We could then do the following:
$ jq -r '[ "name", "date", "var" ], (.data[] | [.name, ([.date[].date1] | join(" ")), .var]) | @csv' file
"name","date","var"
"name1","aaa ccc eee","ggg"
"name2","hhh jjj","lll"
The jq
expression, with a bit of more air to show the structure:
[ "name", "date", "var" ],
(
.data[] |
[
.name,
( [ .date[].date1 ] | join(" ") ),
.var
]
) |
@csv
This first creates the CSV header as an array of strings. It then iterates over the top-level data
array, pulling out the values of the name
and var
keys into an array. It also picks out all values for the date1
sub-keys in the date
array, and concatenates them with spaces as delimiters.
The CSV header array, along with the array constructed for each data
element, are then processed for output using @csv
, which quotes each string and delimited the elements by commas.
Another variant:
box% jq -r '.data[].date |= ( [.[].date1] | join(" ") ) | [ (.data[0] | keys[]) ], (.data[]| [.[]]) | @csv' file
"date","name","var"
"name1","aaa ccc eee","ggg"
"name2","hhh jjj","lll"
This preprocesses the data so that the part of the expression after the first pipe gets the following input:
{
"data": [
{
"name": "name1",
"date": "aaa ccc eee",
"var": "ggg"
},
{
"name": "name2",
"date": "hhh jjj",
"var": "lll"
}
]
}
The code after the first pipe then only has to extract the keys for the CSV header and then collect the values, without having to worry about what the keys are called.
Again, the jq
expression with a bit mor air inserted for illustration:
.data[].date |= ( [ .[].date1 ] | join(" ") ) |
[ (.data[0] | keys[]) ],
( .data[] | [.[]] ) |
@csv
The funky-looking .data[] | [.[]]
creates a array of the values of all keys in each element of data
.
If there is a worry that .[]
applied to a non-array might pull out the values in an order which is different from what keys
produces, then one could use
.data[].date |= ( [ .[].date1 ] | join(" ")) |
(.data[0] | keys) as $keys |
$keys,
( .data[]| [ .[$keys[]] ] ) |
@csv
I.e., pull out the headers keys into the variable $keys
, use that for creating the CSV header and for pulling out the data from the elements in the data
array. The columns may be in a random order, but at least the header and the rest of the data is in the same random order, and a CSV parser would have no issue pulling out the columns by name.
date
is not plain. – Kusalananda Feb 03 '22 at 15:06