Trying this for my own file tree with thousands of json files:
$ find . -type f -name "*[0-9].json" \
-exec bash -c 'printf "%4d %s\n" $(jq ".bbx_basic[]" "$1" | wc -l) "$1"' bashscript {} ';'
[Example output in my tests]
130 ./Images/Training_set/00000845.json
13 ./Images/Training_set/00005869.json
13 ./Images/Training_set/00000991.json
26 ./Images/Training_set/00005631.json
1013 ./Images/Training_set/00001737.json
...
410 ./Annot_txt/Coco_en_2017/instances_val2017.json
0 ./Annot_txt/Coco_en_2017/instances_val2017.json
This restricts the search to regular files for the file names specified by my pattern *[0-9].json
. In your case you would want to execute the following command:
$ find . -type f -name "notes.json" \
-exec bash -c 'printf "%6d %s\n" $(jq ".text1[]" "$1" | wc -l) "$1"' bashscript {} ';'
- Sorting the output:
- The modified command persists the unsorted output in an intermediate temporary file (named
outfile
) in case you want to do more with it, than just sort and send to standard output. You can locate that file in your /tmp/
directory if you want.
- Reverse sorting is on the first (numeric) field, and thus should not depend on the specified locale.
- Sorting starts, when the
find
job executing in the background exits, no matter what its exit status is.
Code:
$ find . -type f -name "notes.json" \
-exec bash -c 'printf "%6d %s\n" $(jq ".text1[]" "$1" | wc -l) "$1" 2>/dev/null >> outfile' bashscript {} ';'; sort -k1,1nr outfile
The above can be made more robust and nimble at the same time, with:
$ find . -type f -name "notes.json" -exec sh -c '
for file do
printf "%6d %s\n" $(jq ".text1[]" "$file" 2>/dev/null | wc -l) "$file"
done' sh {} + >> outfile; sort -k1,1nr outfile
The result is the same but improvements per @StéphaneChazelas' suggestions consist in:
- making it more portable by using
sh
instead of bash
,
- minimizing the number of shells being spawned by
find ...-exec sh -c '...'
adopting a batch treatment (+
) instead of a file-by-file treatment (\;
) of find
's results,
- minimizing the number of file descriptor (
output
) openings by redirecting the output of find
as a whole instead of doing so file-by-file as before.
789 ./D/notes.json
etc.) to be sorted by line numbers? Is that what you want? – tansy May 17 '22 at 14:01