4

According to the JSON specification, forward slashes don't have to be escaped with a backslash but they can be.
I have a JSON file which has all forward slashes in string values escaped for compatibly reasons (but not inside the keys):

{
  "proto://some/path": "\/\/some\/path"
}

However, jq automatically removes these backslashes:

$ echo '{"proto://some/path":"\/\/some\/path"}' | jq -c .
{"proto://some/path":"//some/path"}

I need the output to be {"proto://some/path":"\/\/some\/path"}

How can I tell jq to NOT change any string values and keep these backslashes?
Alternatively, is there a way to re-add these backslashes only to the values after it went through jq?

YourMJK
  • 151
  • Are object keys and values the only places where strings are used? No array of strings for instance? – Stéphane Chazelas Feb 26 '23 at 20:08
  • @StéphaneChazelas The structure of the JSON is arbitrarily nested and it has arrays of strings as well which also need to keep the escaped slash, yes. Good point. – YourMJK Feb 26 '23 at 21:09
  • 4
    Would whatever application that uses this understand the JSON string a\\/b (encoded) as a\/b (decoded) and be able to use it correctly? If not, then there is no point in talking about JSON or using jq and instead treat the document as text and use awk to do whatever you need to do since the thing that is reading the document clearly isn't a JSON parser. The thing with the string a\/b is that to a JSON application, that backslash does not exist. – Kusalananda Feb 26 '23 at 21:16
  • 1
    I've updated my answer to escape /s in strings except those used as object keys. – Stéphane Chazelas Feb 26 '23 at 21:27
  • @Kusalananda It seems like it does indeed also work without the backslash in the programs I tested. However, since this is a somewhat standardized file format, I didn't want to change the encoding incase it breaks something somewhere. I only used jq to sort the keys with --sort-keys since the tool that generated these files produced a non-deterministic order which messed up my git diffs. – YourMJK Feb 26 '23 at 21:47
  • @StéphaneChazelas provided (as usual!) a good and flexible answer, but I am curious as to why you need to have \/ instead of / in your data in the first place? /is not a special character in regex. One possible reason would be for exemple if you use that to create awk code, and use : /pattern/, and dont want any '/' inside pattern to be interpreted as the end of the /.../ ? But you could replace it with $0 ~ "pattern" (or $0 ~ pattern_in_variable ) ? – Olivier Dulac Feb 27 '23 at 08:15
  • @OlivierDulac see comments to Kusalananda's answer. – Stéphane Chazelas Feb 27 '23 at 09:15
  • @StéphaneChazelas thanks, your so link (https://stackoverflow.com/a/1580682, and its answers) are very informative! – Olivier Dulac Feb 27 '23 at 14:15

2 Answers2

4

I'd be surprised if you could. jq decodes its input, does its operations and encodes the resulting object as json. Upon encoding and outputting, it outputs those strings encoded in JSON and the information that there were \s in front of /s has long been lost. Same would happen if those /s where initially written \u002f. You'll find that jq also reformats 1.0 as 1, 1e2 as 100, INF as 1.7976931348623157e+308, etc for the same reason.

However, JSON is a relatively easy file format to process reliably by hand using for instance perl regexps.

To add back the \s before each / in all strings except those that are object keys, you could do:

jq... |
  perl -0777 -pe '
    s{"(?:\\.|.)*?"(\s*:)?}{
      $1 ? $& : $& =~ s{/}{\\/}gr
    }ge'

Which should work correctly even if you have strings embedding "s and \s (like {"key": "//\"//\\"}).

As an alternative to jq, you could use the JSON::PP perl module which can be told to escape slashes (though would be in all strings though):

$ json_pp -json_opt escape_slash < your-file
{"proto:\/\/some\/path":"\/\/some\/path"}

If you're already familiar with perl, the learning curve would be less steep than having to learn the jq syntax.

In any case while the JSON format allows /s to be escaped as \/ (or \u002f like for any character) those don't need to. From what I read online, that's allowed so one can embed a JSON string containing </ in an HTML <script> tag by writing it "<\/whatever". That's why some JSON encoders encode those \/ as / as that makes it more portable. But if that JSON is not meant to be embedded as is in HTML, it probably doesn't matter. And if it were, you'd likely want to have that encoding everywhere including in object keys.

2

I'm assuming that some process that isn't aware of the need for encoding strings in JSON has inserted the literal string they think they need to store, without encoding it. Since the backslashes are escaping characters that don't need escaping, and since no backslashes are escaping the literal backslashes themselves, they will appear to "disappear" when jq is used to extract and decode the string or when it processes the document for other reasons.

In short, the forward slashes don't have to be escaped (escaping them is effectively a no-op), but backslashes need to be if you want to keep them as literal backslashes.

The following will change each / into \\/ (which is how you write \/ in a JSON string) in all string values in your document, recursively. Remember, when the jq expression is processing the data, the jq parser has already removed the backslashes.

jq 'walk(if type == "string" then gsub("/"; "\\/") else . end)' file

For the given example document, this would generate

{
  "proto://some/path": "\\/\\/some\\/path"
}

Extracting and decoding the encoded string value from the modified document would give you \/\/some\/path:

$ jq 'walk(if type == "string" then gsub("/"; "\\/") else . end)' file | jq -r '."proto://some/path"'
\/\/some\/path

You would arrive at the same JSON document if you created it from scratch like so:

$ jq -n --arg 'proto://some/path' '\/\/some\/path' '$ARGS.named'
{
  "proto://some/path": "\\/\\/some\\/path"
}
Kusalananda
  • 333,661
  • 1
    My understanding is that the OP wants the JSON output to have /s in those strings escaped as \/ (allowed in JSON, see JSON: why are forward slashes escaped?) – Stéphane Chazelas Feb 26 '23 at 20:23
  • @StéphaneChazelas I'm thinking a process that is not JSON-aware has inserted the literal string into the JSON document without encoding it. This causes the backslashes to "disappear" when the string is extracted with jq. – Kusalananda Feb 26 '23 at 20:26
  • 1
    \/, \u002f and / are 3 valid encodings of a / in JSON strings. printf '%s\n' '["/", "\/", "\u002f"]' | jq -c outputs ["/","/","/"], the OP would like the original encoding to be preserved. – Stéphane Chazelas Feb 26 '23 at 20:31
  • @StéphaneChazelas My point is that the string probably wasn't JSON-encoded when inserted into the document (possibly through injecting a shell variable directly into a text string). I'm assuming they want the \/ to be retained as \/ when extracting it from the document. – Kusalananda Feb 26 '23 at 20:34
  • That's also speculation on my part, but I don't think so. The "compatibly reasons" suggests those / were intentionally escaped as \/ as is apparently common. See the SO link I gave above. – Stéphane Chazelas Feb 26 '23 at 20:39
  • @StéphaneChazelas The user absolutely escapes the slash for a reason, but it's likely that this is for whatever application they need to finally use that string in. To store it in a JSON string, it has to be additonally encoded, which they (or someone) forgot or did not think about. – Kusalananda Feb 26 '23 at 20:43
  • See for instance: php -r 'echo json_encode("/");'. As mentioned there, a reason to want /s to be escaped is when the json is embedded in HTML <script> tags, though here it rather looks like it's about feeding that json to some software that expects escaped slashes. – Stéphane Chazelas Feb 26 '23 at 20:44
  • @StéphaneChazelas Related on-point comment elsewhere: How to remove backslash on json_encode() function? – Kusalananda Feb 26 '23 at 20:51
  • To add some context: These are JSON files generated as part of Apple's Xcode DocC documentation archive format which is viewable in a browser. I don't know exactly how it works but I think it's something along what @StéphaneChazelas mentioned, the JSON is probably read by JavaScript and embedded in the HTML. – YourMJK Feb 26 '23 at 21:05
  • Thanks for your answer. Unfortunately, it doesn't solve my problem, I do indeed need the output JSON to have "/" not "\/". I also tried gsub("/"; "\/") but that produces just "/". – YourMJK Feb 26 '23 at 21:15
  • 1
    @YourMJK Ok, so whatever is reading this is not a JSON parser, which means you might just as well drop the use of jq and use awk or whatever other text processor to work with this data. – Kusalananda Feb 26 '23 at 21:19
  • @Kusalananda Hence the second part of my question, whether this would alternatively be possible using a different tool if jq can't do it. I couldn't figure out how to only target the values and not the keys for the replacing using text processors. – YourMJK Feb 26 '23 at 21:38
  • FYI, PHP's json_encode() escapes forward slash by default (you have to use the JSON_UNESCAPED_SLASHES flag to disable it). So it's probably not true that this JSON wasn't properly JSON-encoded. – Barmar Feb 27 '23 at 16:06