1

I have the below JSON file:

{ data : [
  {
   "name" : "name1"
   "date" : [
     {
      "date1" : "aaa",
      "date2" : "bbb"
     },
     {
      "date1" : "ccc",
      "date2" : "ddd"
     },
     {
      "date1" : "eee",
      "date2" : "fff"
     },
     "var" : "ggg"
 },
{
   "name" : "name2"
   "date" : [
     {
      "date1" : "hhh",
      "date2" : "iii"
     },
     {
      "date1" : "jjj",
      "date2" : "kkk"
     },
     "var" : "lll"
  }
 ]
}

I would like to have CSV file in this format:

name, date, var
name1, aaa ccc eee, ggg
name2, hhh jjj, lll

Is this possible by using only jq?

rt1111
  • 13

1 Answers1

3

Assuming that your JSON document is well formed, like the following modified variant of the text that you present:

{
  "data": [
    {
      "name": "name1",
      "date": [
        {
          "date1": "aaa",
          "date2": "bbb"
        },
        {
          "date1": "ccc",
          "date2": "ddd"
        },
        {
          "date1": "eee",
          "date2": "fff"
        }
      ],
      "var": "ggg"
    },
    {
      "name": "name2",
      "date": [
        {
          "date1": "hhh",
          "date2": "iii"
        },
        {
          "date1": "jjj",
          "date2": "kkk"
        }
      ],
      "var": "lll"
    }
  ]
}

We could then do the following:

$ jq -r '[ "name", "date", "var" ], (.data[] | [.name, ([.date[].date1] | join(" ")), .var]) | @csv' file
"name","date","var"
"name1","aaa ccc eee","ggg"
"name2","hhh jjj","lll"

The jq expression, with a bit of more air to show the structure:

[ "name", "date", "var" ],
(
    .data[] |
    [
      .name,
      ( [ .date[].date1 ] | join(" ") ),
      .var
    ]
) |
@csv

This first creates the CSV header as an array of strings. It then iterates over the top-level data array, pulling out the values of the name and var keys into an array. It also picks out all values for the date1 sub-keys in the date array, and concatenates them with spaces as delimiters.

The CSV header array, along with the array constructed for each data element, are then processed for output using @csv, which quotes each string and delimited the elements by commas.


Another variant:

box% jq -r '.data[].date |= ( [.[].date1] | join(" ") ) | [ (.data[0] | keys[]) ], (.data[]| [.[]]) | @csv' file
"date","name","var"
"name1","aaa ccc eee","ggg"
"name2","hhh jjj","lll"

This preprocesses the data so that the part of the expression after the first pipe gets the following input:

{
  "data": [
    {
      "name": "name1",
      "date": "aaa ccc eee",
      "var": "ggg"
    },
    {
      "name": "name2",
      "date": "hhh jjj",
      "var": "lll"
    }
  ]
}

The code after the first pipe then only has to extract the keys for the CSV header and then collect the values, without having to worry about what the keys are called.

Again, the jq expression with a bit mor air inserted for illustration:

.data[].date |= ( [ .[].date1 ] | join(" ") ) |
[ (.data[0] | keys[]) ],
( .data[] | [.[]] ) |
@csv

The funky-looking .data[] | [.[]] creates a array of the values of all keys in each element of data.

If there is a worry that .[] applied to a non-array might pull out the values in an order which is different from what keys produces, then one could use

.data[].date |= ( [ .[].date1 ] | join(" ")) |
(.data[0] | keys) as $keys |
$keys,
( .data[]| [ .[$keys[]] ] ) |
@csv

I.e., pull out the headers keys into the variable $keys, use that for creating the CSV header and for pulling out the data from the elements in the data array. The columns may be in a random order, but at least the header and the rest of the data is in the same random order, and a CSV parser would have no issue pulling out the columns by name.

Kusalananda
  • 333,661