Is it possible to create JSON strings from a pipe using jq?

Question

Let's say I use the command find "$HOME" -maxdepth 1 -type d and I get the following result:

/home/user/folder1
/home/user/folder2
/home/user/folder3
/home/user/folder4

I'd like to use jq on a pipe and create different JSON lines as the following:

{ "path": "/home/user/folder1", "type":"directory"}
{ "path": "/home/user/folder2", "type":"directory"}
{ "path": "/home/user/folder3", "type":"directory"}
{ "path": "/home/user/folder4", "type":"directory"}

I expected to solve it with jq to avoid putting this list of folders inside an array and creating them one by one in a loop. In pseudo-code the idea would be:

find "$HOME" -maxdepth 1 -type d | jq '.logic-to-create-json-strings'

Is it possible to do with jq?

JSON cannot represent arbitrary file paths (which are sequences of bytes other than zero) in its strings which are sequences of UTF-8 encoded characters. Also note that the output of find is not post-processable unless you use -print0. — Stéphane Chazelas, Apr 17 '22 at 19:35

Kusalananda · Accepted Answer · 2024-03-06T08:18:15.923

3

This answer assumes that your filenames (or whatever text you wish to JSON encode) is valid UTF-8.

Two alternatives:

Not using xargs: Call jq with the pathnames as positional parameters directly from find with -exec. Read the found pathnames with --args and access them as the array $ARGS.positional in the jq expression. For each pathname, create a JSON object.

find "$HOME" -maxdepth 1 -type d \
    -exec jq -n -c \
      '$ARGS.positional[] as $path | { path: $path, type: "directory" }' \
      --args {} +

Using xargs: Use -print0 with find and -0 with xargs to safely pass the found pathnames from find to xargs. The jq expression is identical to the above, only how the pathnames are passed between find and jq differs.

find "$HOME" -maxdepth 1 -type d -print0 |
xargs -0 jq -n -c \
  '$ARGS.positional[] as $path | { path: $path, type: "directory" }' --args

With both approaches above, jq would encode the found pathnames so that they can be represented as JSON strings.

An alternative jq expression, with the same effect as

$ARGS.positional[] as $path | { path: $path, type: "directory" }

is

$ARGS.positional | map({ path: ., type: "directory" })[]

To read lines into a set of objects as you show, you may use the following jq command, which reads from its standard input stream:

jq -R -c '{ path: ., type: "directory" }'

edited Mar 06 '24 at 08:18

answered Apr 17 '22 at 19:37

Kusalananda

333,661

1

That only works if file paths are valid UTF-8 encoded text. Each byte that cannot be decoded into a UTF-8 character by jq would be changed into the � character. – Stéphane Chazelas Apr 17 '22 at 19:52
@StéphaneChazelas Noted, thanks. – Kusalananda Apr 17 '22 at 19:53
@Kusalananda I've just realized that the output of find also prints the current folder when I execute it as I did in my example. So /home/user ends up being included on the list. One workaround that I thought was using find "$HOME/" -maxdepth 1 -type d -print0 | grep -v '^/.*/$' for differentiating the actual folder with a final /... But it doesn't seem like an ideal solution. Do you know if there's any find parameter to handle not showing the current folder as well? – raylight Apr 17 '22 at 21:07
Ah, I found it... I should be using find "$HOME" -maxdepth 1 -mindepth 1 -type d -print0... Just adding -mindepth 1 solved it! – raylight Apr 17 '22 at 21:14
1

@raylight Yeah, that or ! -path "$HOME" before -print0 or -exec. – Kusalananda Apr 17 '22 at 21:42

Stéphane Chazelas · Answer 2 · 2022-04-17T20:44:43.197

JSON cannot directly represent arbitrary file paths (which are sequences of bytes other than zero) in its strings which are sequences of UTF-8 encoded characters. Also note that the output of find is not post-processable unless you use -print0.

For instance a file path might be $'/home/St\xc3\xa9phane\nChazelas/ISO-8859-1/R\xe9sum\xe9' (here using ksh93-style $'...' notation to represent byte values), with a é UTF-8 encoded in Stéphane, and ISO-8859-1 encoded in Résumé.

JSON cannot represent that file path unless you use some encoding. That could be for instance URI encoding:

{ "path": "/home/St%C3%A9phane\nChazelas/ISO-8859-1/R%E9sum%E9" }

Another approach could be to interpret the path as if it were ISO-8859-1 encoded (or any single-byte charset where any byte value can make a valid character¹):

{ "path": "/home/StÃ©phane\nChazelas/ISO-8859-1/Résumé" }

jq has some support for doing URI encoding, but AFAIK cannot be fed non-UTF-8 input. AFAIK, it doesn't have any support for encoding conversion either.

On a GNU system, for the second approach where file paths are considered to be ISO-8859-1 encoded you may however be able to do something like:

find ~ -type d -print0 |
  iconv -f iso-8859-1 -t utf-8 |
  tr '\0\n' '\n\0' |
  jq -Rc '{"path":sub("\u0000";"\n"),"type":"directory"}'

Which on our example above gives:

{"path":"/home/StÃ©phane\nChazelas/ISO-8859-1/Résumé.pdf","type":"directory"}

^{¹ though iso-8859-1 specifically is an obvious choice as its code points match those of Unicode. So if your json string contains a U+00E9 character for instance, you know it corresponds to the 0xE9 byte. You could add the -a option to jq for non-ASCII characters to be represented as \uXXXX instead.}

Is it possible to create JSON strings from a pipe using jq?

2 Answers2