Using jq to extract values from column-oriented JSON and format in CSV

Question

I have the below JSON file, with the data stored as columns enumerated by rank in an array:

{
  "data": [
    {
      "displayName": "First Name",
      "rank": 1,
      "value": "VALUE"
    },
    {
      "displayName": "Last Name",
      "rank": 2,
      "value": "VALUE"
    },
    {
      "displayName": "Position",
      "rank": 3,
      "value": "VALUE"
    },
    {
      "displayName": "Company Name",
      "rank": 4,
      "value": "VALUE"
    },
    {
      "displayName": "Country",
      "rank": 5,
      "value": "VALUE"
    }
  ]
}

I would like to have a CSV file in this format, where the header come from the value of a column's displayName and the data in the column is the singular value key's value:

First Name, Last Name, Position, Company Name, Country
VALUE, VALUE, VALUE, VALUE, VALUE

Is this possible by using only jq? I don't have any programming skills.

score 83 · Answer 1 · answered Sep 07 '15 at 03:12

jq has a filter, @csv, for converting an array to a CSV string. This filter takes into account most of the complexities associated with the CSV format, beginning with commas embedded in fields. (jq 1.5 has a similar filter, @tsv, for generating tab-separated-value files.)

Of course, if the headers and values are all guaranteed to be free of commas and double quotation marks, then there may be no need to use the @csv filter. Otherwise, it would probably be better to use it.

For example, if the 'Company Name' were 'Smith, Smith and Smith', and if the other values were as shown below, invoking jq with the "-r" option would produce valid CSV:

$ jq -r '.data | map(.displayName), map(.value) | @csv' so.json2csv.json
"First Name","Last Name","Position","Company Name","Country"
"John (""Johnnie"")","Doe","Director, Planning and Posterity","Smith, Smith and Smith","Transylvania"

I was able to 'jq somestuff | map(.) | @csv', very handy! Thanks — flickerfly, Apr 26 '17 at 17:44
Your example is going to put all of the display names on the first line and all of the values on the second line, instead of having one line per record. — Rag, Dec 22 '18 at 08:14
@Rag This is exactly what the user in the question wants to do. — Kusalananda, Jun 09 '22 at 18:33

score 55 · Answer 2 · edited Aug 25 '22 at 12:50

55

I prefer to make each record a row in my CSV.

jq -r '.data | map([.displayName, .rank, .value] | join(", ")) | join("\n")'

Given the data in the question, this outputs

First Name, 1, VALUE
Last Name, 2, VALUE
Position, 3, VALUE
Company Name, 4, VALUE
Country, 5, VALUE

edited Aug 25 '22 at 12:50

Kusalananda

333,661

answered Nov 13 '15 at 01:33

Silas Paul

675

2

What if .value is a number? I receive the error "string and number cannot be added" – Cos Sep 05 '16 at 15:25
4

@Cos something like .value|tostring instead of .value in the above example – matheeeny Sep 20 '16 at 16:11
6

@Cos, I found parenthesis are required. (.value|tostring) – ciscogambo Sep 22 '16 at 18:45
7

Also, use jq -r to strip the quotes – Clay Aug 30 '19 at 21:14
... but this was not what the user in the question asked for. – Kusalananda Jun 09 '22 at 18:34
2

@Kusalananda True, but due to the broad question title, many people are visiting this page to find what this answer achieves. – conbask Jul 06 '22 at 18:45
@conbask The question is clear enough as it is, showing exactly what should be done. – Kusalananda Aug 24 '22 at 19:18
2

@Kusalananda The question content is specific and great, but the title is broad. This answer has the second-most upvotes because people are looking for this particular solution and arriving here due to the title. – conbask Aug 25 '22 at 01:30

score 35 · Answer 3 · edited Jun 09 '15 at 15:36

35

Given just this file, you can do something like:

<testfile jq -r '.data | map(.displayName), map(.value) | join(", ")'

The . operator selects a field from an object/hash. Thus, we start with .data, which returns the array with the data in it. We then map over the array twice, first selecting the displayName, then selecting the value, giving us two arrays with just the values of those keys. For each array, we join the elements with ", " forming two lines. The -r argument tells jq to not quote the resulting strings.

If your actual file is longer (ie, has entries for more than one person), you will likely need something a bit more complicated.

edited Jun 09 '15 at 15:36

slm

369,824

answered Oct 23 '14 at 18:57

Steven D

46,160

It is not working for me. In a related topic, the answer http://stackoverflow.com/questions/32960857/how-to-convert-arbirtrary-simple-json-to-csv-using-jq is both working and very well explained! – herve Jun 29 '16 at 15:22
This will list all .displayName fields comma separated and after that it will list all the .value fields in a comma separated list. The aim is to have a line that has <.displayname>, <.value> for every line – karatedog Jan 27 '22 at 11:09

score 13 · Answer 4 · answered Oct 23 '14 at 18:59

I've found jq hard to wrap my head around. Here's some Ruby:

ruby -rjson -rcsv -e '
  data = JSON.parse(File.read "file.json")
  data["data"].collect {|item| [item["displayName"], item["value"]]}
              .transpose
              .each {|row| puts row.to_csv}
'

First Name,Last Name,Position,Company Name,Country
VALUE,VALUE,VALUE,VALUE,VALUE

The ruby JSON parser barfed about the trailing comma before the close bracket.

score 3 · Answer 5 · answered Oct 23 '14 at 19:29

Since you tagged this python and assuming name of json file is x.json

import os, json
with open('x.json') as f:
    x  = json.load(f)
    print '{}{}{}'.format(', '.join(y['displayName'] for y in x['data']), os.linesep,
             ', '.join(y['value'] for y in x['data']))
First Name, Last Name, Position, Company Name, Country
VALUE, VALUE, VALUE, VALUE, VALUE

score 2 · Answer 6 · edited Aug 25 '22 at 13:00

This:

INPUT | jq -r '[.[][].displayName], [.[][].value]| join(", ")'

...got me...

First Name, Last Name, Position, Company Name, Country
VALUE, VALUE, VALUE, VALUE, VALUE

How it works in a nutshell:

I traversed through to the third level of data objects using the empty [] index field form and .dot notation.
Once deep enough I specified the data fields I wanted by name like .[][].displayName.
I assured that my desired fields were self-associated by returning them as separate array objects like [.[][].displayName], [.[][].value]
And then piped those objects to the join(", ") function to be joined as separate entities.

In truth doing [.field] is merely another way to map(.field) but this is a little more specific in that it specifies the depth level for retrieving the desired data.

score 2 · Answer 7 · answered Nov 17 '21 at 19:57

Old question but it seems like all the answers recommend indexing by displayName, which to me doesn't make sense. CSV is not an indexed format, it's a list of rows, so it's illogical that you have to index by something. The only exception is https://unix.stackexchange.com/a/673753/488462, which would be my preference in practice, except that it uses csvkit while the question is explicitly about jq.

So here's a jq-only way in two steps:

Construct a header: cat data.json | jq '.[0] | keys_unsorted | @csv' > data.csv
Transform the values: cat data.json | jq '.[] | map(values) | @csv' >> data.csv

You can combine these into one command with tee, or construct a complex jq query that does both things, if you want, but I think this version is already good enough.

Note that you have to use keys_unsorted because keys will sort your keys (while values doesn't) resulting in mismatched columns. Also, while usually people want both steps, technically the CSV format does not require a header - it is just an optional convenience (and not always desirable). So in those cases you may omit the header.

Lastly, your example input is needlessly wrapped in an object called data. I assume you unpack this with a trivial command like cat input.json | jq '.data' > data.json, since it isn't really relevant to what's being asked (transforming to CSV).

score 0 · Answer 8 · answered Aug 24 '22 at 19:29

Assuming that the rank is included in the data to specify the correct ordering of the columns, if the columns happen to be listed out of order, we may use that rank value to construct an array of headers and values to be converted to CSV like so:

jq -r '
    reduce .data[] as $col ([]; .[$col.rank - 1] = [$col.displayName, $col.value]) |
    transpose[] | @csv' file

The data after the reduce is the equivalent of this:

[
   [ "First Name", "VALUE" ],
   [ "Last Name", "VALUE" ],
   [ "Position", "VALUE" ],
   [ "Company Name", "VALUE" ],
   [ "Country", "VALUE" ]
]

Transposing this and feeding it through the @csv output filter gives us

"First Name","Last Name","Position","Company Name","Country"
"VALUE","VALUE","VALUE","VALUE","VALUE"

Ignoring the rank value, we may simplify the expression by replacing the reduce with a simple map:

jq -r '.data | map([.displayName, .value]) | transpose[] | @csv' file

Using jq to extract values from column-oriented JSON and format in CSV

8 Answers8

Linked

Related