5

I want to see how many lines exist in each file that has been found using the find command.

I know I can use wc -l to find the lines number of a single file. But this does not work when piped from the output of find:

find -type f -name package.json | wc -l

This returns the count of the found files. I want to return the count of lines of each found file.

2 Answers2

10

wc takes the list of files whose bytes/chars/words/lines to count as arguments.

When called with no argument, it reports those bytes/chars/words/lines in its stdin. So if you're piping find to wc -l, you'll get the number of newline characters in the output of find, so that'll be the number of found files plus the number of newline characters in their paths.

The GNU implementation of wc can also take the list of files NUL-delimited from a file with the --files0-from option, where it treats - as meaning stdin (not the file called -), so you can do:

find . -name package.json -type f -print0 |
  wc -l --files0-from=-

With any standard find or wc implementation, you could get find to pass the list of file paths as arguments to wc with:

find . -name package.json -type f -exec wc -l {} +

But if there's a large number of matching files, that could end up running wc several times resulting in several occurrences of a total line.

wc prints the total line when given at least 2 files to process, so to skip the total line, you could do:

find . -name package.json -type f -exec wc -l {} ';'

Though that would be very inefficient as forking a process and executing a command for each file is quite expensive.

If it's the total you're actually interested in, then you'd do:

find . -name package.json -type f -exec cat {} + | wc -l

Where we feed the concatenation of the contents of those files to wc.

With zsh and any wc, you could do:

wc -l -- **/package.json(D.)

(D for Dotglob to get hidden ones as well like find does and . to only include regular files as the equivalent of -type f).

That has the advantage of giving you a sorted list and avoid the ./ prefix.

This time, if there are no or too many matching files, you'll get an error.

With GNU du, you can avoid those by passing the glob expansion NUL-delimited to wc -l --files0-from=- with:

print -rNC1 -- **/package.json(ND.) | wc -l --files0-from=-

Also beware that in the json format, newline characters (which wc -l counts) are not significant so I'm not sure that's a useful metric you're getting.

You could return the number of elements in some array in those files for instance instead with:

find . -name package.json -type f -exec \
  jq -r '[.devDependencies|length,input_filename]|@csv' {} +

(assuming the file paths are UTF-8 encoded text and here giving you the result in CSV format).

4

You can use xargs to pipe standard input into the argument vector where you need it:

find -type f -name package.json | xargs wc -l

Or simply let shell command substitution fill it

wc -l $(find -type f -name package.json)
  • 1
    This doesn't appear to add anything that isn't already included in Stéphane's answer. It also has the drawback that the commands break if there are special characters in the filename (space or newline) although this does not apply to the specific file used here. – doneal24 Jan 06 '23 at 20:18
  • 1
    See also Why is looping over find's output bad practice? for more details as to why those two approaches are incorrect. – Stéphane Chazelas Jan 06 '23 at 21:13
  • @doneal24, that's not limited to space or newline. For xargs, there's also other whitespace characters, quotes and backslashes and with some xargs implementations non-text file names. For $(...) there are all the characters of $IFS (by default also includes TAB) and the wildcard ones. It may apply here as the directories those package.json files are found in may very well contain those characters. – Stéphane Chazelas Jan 06 '23 at 21:16
  • 2
    @StéphaneChazelas is the concern white-space, or is there an additional problem. I have been doing this but with --print0 on find, and -0 on xargs. I have never found a problem yet. – ctrl-alt-delor Jan 07 '23 at 00:02
  • @StéphaneChazelas I wanted to give an indication of the problem, not an exhaustive list. Probably should have put ‘for instance’ in the comment. – doneal24 Jan 07 '23 at 04:13
  • 1
    @ctrl-alt-delor see find . -print0 | xargs -0 cmd vs find . -exec cmd {} +. Also, Roman is not using -0 here, so the problem is with a bunch of characters (and non-characters), not just space. – Stéphane Chazelas Jan 07 '23 at 10:55
  • i was aware of the ailments -print0 set out to heal and considered them unimportant for a quick typed command rather than an unattended script. i know that blanks god beware /home/hacker/\ /etc/shadow break functionality but since they are rare in $pkgdir/package.json trees i figured the obvious shorthands were missing first. i had not initially realized stéphane had consciously avoided them nor the find -exec + addition. unminding repetitiveness i had realized stéphane had already answered the question although silent about the quick hacks. **/package.json direction my favorite. – Roman Czyborra Jan 07 '23 at 11:24
  • I would clearly point out the limitations here and when you might want to use those. Rather than obvious shorthands, I would rather call them obsolete bad practice examples. I find it a shame that the GNU findutils doc still advertise those. – Stéphane Chazelas Jan 07 '23 at 14:30