0

I have a notes.json file inside each directory. The following command counts length of each notes.json file and returns a sorted output by each file's number of lines.

find . -name notes.json | xargs wc -l | sort -nr

It returns

789 ./D/notes.json
789 ./F/notes.json
574 ./A/notes.json
519 ./G/notes.json

Now I would like to include a search pattern for contents inside notes.json and would like to return the sorted number of lines for each notes.json file.

I tried find . -name notes.json | xargs cat | jq '."text1[]"' | wc -l. However, I receive only one value i.e. The total number of lines between text1[ ] in all notes.json. Of course, this is because cat outputs the pattern match of all files together. Is there a way to output the line lengths (return from matching pattern) for each notes.json file - sorted?

SKPS
  • 13

3 Answers3

0
  • Not sorting the output:

Trying this for my own file tree with thousands of json files:

$ find . -type f -name "*[0-9].json" \
    -exec bash -c 'printf "%4d %s\n" $(jq ".bbx_basic[]" "$1" | wc -l) "$1"' bashscript {} ';'
[Example output in my tests]
 130 ./Images/Training_set/00000845.json
  13 ./Images/Training_set/00005869.json
  13 ./Images/Training_set/00000991.json
  26 ./Images/Training_set/00005631.json
1013 ./Images/Training_set/00001737.json
...
 410 ./Annot_txt/Coco_en_2017/instances_val2017.json
   0 ./Annot_txt/Coco_en_2017/instances_val2017.json

This restricts the search to regular files for the file names specified by my pattern *[0-9].json . In your case you would want to execute the following command:

 $ find . -type f -name "notes.json" \
     -exec bash -c 'printf "%6d %s\n" $(jq ".text1[]" "$1" | wc -l) "$1"' bashscript {} ';'
  • Sorting the output:
    • The modified command persists the unsorted output in an intermediate temporary file (named outfile) in case you want to do more with it, than just sort and send to standard output. You can locate that file in your /tmp/ directory if you want.
    • Reverse sorting is on the first (numeric) field, and thus should not depend on the specified locale.
    • Sorting starts, when the find job executing in the background exits, no matter what its exit status is.

Code:

$ find . -type f -name "notes.json" \
    -exec bash -c 'printf "%6d %s\n" $(jq ".text1[]" "$1" | wc -l) "$1" 2>/dev/null >> outfile' bashscript {} ';'; sort -k1,1nr outfile

The above can be made more robust and nimble at the same time, with:

$ find . -type f -name "notes.json" -exec sh -c '
    for file do
      printf "%6d %s\n" $(jq ".text1[]" "$file" 2>/dev/null | wc -l) "$file"
    done' sh {} + >> outfile; sort -k1,1nr outfile

The result is the same but improvements per @StéphaneChazelas' suggestions consist in:

  • making it more portable by using sh instead of bash,
  • minimizing the number of shells being spawned by find ...-exec sh -c '...' adopting a batch treatment (+) instead of a file-by-file treatment (\;) of find's results,
  • minimizing the number of file descriptor (output) openings by redirecting the output of find as a whole instead of doing so file-by-file as before.
Cbhihe
  • 2,701
  • I also need the folder containing the notes.json and as well need the sorting done as per number of lines. Overall, the output needs to be in the format shown in the question. – SKPS May 16 '22 at 14:57
  • Why the & wait $!? – Stéphane Chazelas May 18 '22 at 07:09
  • @StéphaneChazelas: It is a remnant of previous logic. Now that appending to outfile is done batch-wise, it became superfluous. There is no possible race condition with -exec any longer and sorting takes place on the final outfile when writing to FD ends, not (potentially) piece-wise on outfile to which find results are appended file-by-file. There was another reason for not using && instead. In A && B, B only ever executes if the exit status of A is 0. Because find may sometimes produce "access denied" permission errors, sort wouldn't execute. & wait $!; solved that. – Cbhihe May 18 '22 at 08:34
0

Is there any logical reasoning why you would need that? One can write lots of records into a single line, because line breaks are not required. I'd rather say that you've chosen the wrong weapon for dealing with JSON. Therefore I'd suggest to use the PHP CLI, eg. in order to count the items in a JSON array. Current versions come with a driver for the JSON file-format, by default. The PHP functions to use would be get_file_contents(), json_decode() and sizeof().

-2

Unfortunately, I could not do a one-liner for this requirement. The following script solved the problem

#!/bin/bash
declare -a arr
arr=()
for i in $(find . -name notes.json)
do
  arr+=`(echo $i | xargs cat | jq '.text1[]'  | wc -l); echo $i; echo "\n"`
done
echo -e $arr | sort -nr > out.txt
SKPS
  • 13