95

I have the below JSON file, with the data stored as columns enumerated by rank in an array:

{
  "data": [
    {
      "displayName": "First Name",
      "rank": 1,
      "value": "VALUE"
    },
    {
      "displayName": "Last Name",
      "rank": 2,
      "value": "VALUE"
    },
    {
      "displayName": "Position",
      "rank": 3,
      "value": "VALUE"
    },
    {
      "displayName": "Company Name",
      "rank": 4,
      "value": "VALUE"
    },
    {
      "displayName": "Country",
      "rank": 5,
      "value": "VALUE"
    }
  ]
}

I would like to have a CSV file in this format, where the header come from the value of a column's displayName and the data in the column is the singular value key's value:

First Name, Last Name, Position, Company Name, Country
VALUE, VALUE, VALUE, VALUE, VALUE

Is this possible by using only jq? I don't have any programming skills.

Kusalananda
  • 333,661
Kerim
  • 951
  • 1
  • 6
  • 3

8 Answers8

83

jq has a filter, @csv, for converting an array to a CSV string. This filter takes into account most of the complexities associated with the CSV format, beginning with commas embedded in fields. (jq 1.5 has a similar filter, @tsv, for generating tab-separated-value files.)

Of course, if the headers and values are all guaranteed to be free of commas and double quotation marks, then there may be no need to use the @csv filter. Otherwise, it would probably be better to use it.

For example, if the 'Company Name' were 'Smith, Smith and Smith', and if the other values were as shown below, invoking jq with the "-r" option would produce valid CSV:

$ jq -r '.data | map(.displayName), map(.value) | @csv' so.json2csv.json
"First Name","Last Name","Position","Company Name","Country"
"John (""Johnnie"")","Doe","Director, Planning and Posterity","Smith, Smith and Smith","Transylvania"
peak
  • 990
55

I prefer to make each record a row in my CSV.

jq -r '.data | map([.displayName, .rank, .value] | join(", ")) | join("\n")'

Given the data in the question, this outputs

First Name, 1, VALUE
Last Name, 2, VALUE
Position, 3, VALUE
Company Name, 4, VALUE
Country, 5, VALUE
Kusalananda
  • 333,661
35

Given just this file, you can do something like:

<testfile jq -r '.data | map(.displayName), map(.value) | join(", ")'

The . operator selects a field from an object/hash. Thus, we start with .data, which returns the array with the data in it. We then map over the array twice, first selecting the displayName, then selecting the value, giving us two arrays with just the values of those keys. For each array, we join the elements with ", " forming two lines. The -r argument tells jq to not quote the resulting strings.

If your actual file is longer (ie, has entries for more than one person), you will likely need something a bit more complicated.

slm
  • 369,824
Steven D
  • 46,160
  • It is not working for me. In a related topic, the answer http://stackoverflow.com/questions/32960857/how-to-convert-arbirtrary-simple-json-to-csv-using-jq is both working and very well explained! – herve Jun 29 '16 at 15:22
  • This will list all .displayName fields comma separated and after that it will list all the .value fields in a comma separated list. The aim is to have a line that has <.displayname>, <.value> for every line – karatedog Jan 27 '22 at 11:09
13

I've found jq hard to wrap my head around. Here's some Ruby:

ruby -rjson -rcsv -e '
  data = JSON.parse(File.read "file.json")
  data["data"].collect {|item| [item["displayName"], item["value"]]}
              .transpose
              .each {|row| puts row.to_csv}
'
First Name,Last Name,Position,Company Name,Country
VALUE,VALUE,VALUE,VALUE,VALUE

The ruby JSON parser barfed about the trailing comma before the close bracket.

glenn jackman
  • 85,964
3

Since you tagged this python and assuming name of json file is x.json

import os, json
with open('x.json') as f:
    x  = json.load(f)
    print '{}{}{}'.format(', '.join(y['displayName'] for y in x['data']), os.linesep,
             ', '.join(y['value'] for y in x['data']))
First Name, Last Name, Position, Company Name, Country
VALUE, VALUE, VALUE, VALUE, VALUE
iruvar
  • 16,725
2

This:

INPUT | jq -r '[.[][].displayName], [.[][].value]| join(", ")'

...got me...

First Name, Last Name, Position, Company Name, Country
VALUE, VALUE, VALUE, VALUE, VALUE

How it works in a nutshell:

  1. I traversed through to the third level of data objects using the empty [] index field form and .dot notation.
  2. Once deep enough I specified the data fields I wanted by name like .[][].displayName.
  3. I assured that my desired fields were self-associated by returning them as separate array objects like [.[][].displayName], [.[][].value]
  4. And then piped those objects to the join(", ") function to be joined as separate entities.

In truth doing [.field] is merely another way to map(.field) but this is a little more specific in that it specifies the depth level for retrieving the desired data.

Kusalananda
  • 333,661
mikeserv
  • 58,310
2

Old question but it seems like all the answers recommend indexing by displayName, which to me doesn't make sense. CSV is not an indexed format, it's a list of rows, so it's illogical that you have to index by something. The only exception is https://unix.stackexchange.com/a/673753/488462, which would be my preference in practice, except that it uses csvkit while the question is explicitly about jq.

So here's a jq-only way in two steps:

  • Construct a header: cat data.json | jq '.[0] | keys_unsorted | @csv' > data.csv
  • Transform the values: cat data.json | jq '.[] | map(values) | @csv' >> data.csv

You can combine these into one command with tee, or construct a complex jq query that does both things, if you want, but I think this version is already good enough.

Note that you have to use keys_unsorted because keys will sort your keys (while values doesn't) resulting in mismatched columns. Also, while usually people want both steps, technically the CSV format does not require a header - it is just an optional convenience (and not always desirable). So in those cases you may omit the header.

Lastly, your example input is needlessly wrapped in an object called data. I assume you unpack this with a trivial command like cat input.json | jq '.data' > data.json, since it isn't really relevant to what's being asked (transforming to CSV).

Haterind
  • 169
0

Assuming that the rank is included in the data to specify the correct ordering of the columns, if the columns happen to be listed out of order, we may use that rank value to construct an array of headers and values to be converted to CSV like so:

jq -r '
    reduce .data[] as $col ([]; .[$col.rank - 1] = [$col.displayName, $col.value]) |
    transpose[] | @csv' file

The data after the reduce is the equivalent of this:

[
   [ "First Name", "VALUE" ],
   [ "Last Name", "VALUE" ],
   [ "Position", "VALUE" ],
   [ "Company Name", "VALUE" ],
   [ "Country", "VALUE" ]
]

Transposing this and feeding it through the @csv output filter gives us

"First Name","Last Name","Position","Company Name","Country"
"VALUE","VALUE","VALUE","VALUE","VALUE"

Ignoring the rank value, we may simplify the expression by replacing the reduce with a simple map:

jq -r '.data | map([.displayName, .value]) | transpose[] | @csv' file
Kusalananda
  • 333,661