0

I'm trying to sort a directory of files by LoC.

But sort appears to do nothing if the lines are piped in:

paths=`find ./src/ | egrep "\.(cpp|h)$"`
for path in $paths; do
wc -l $path | sort -n;
done

Results in something like this (pre-sorted by find, but the wc numbers are ignored):

50 /a/a.cpp
10 /a/a.h
200 /b/b.cpp
13 /b/b.h
...

If I use sort on a file instead of a pipe:

for path in $paths; do
wc -l $path >> test.txt;
done

sort -n test.txt

it does work:

```bash
10 /a/a.h
13 /b/b.h
50 /a/a.cpp
200 /b/b.cpp
...

Why does the pipe version not work?

3 Answers3

0

You’re piping each individual wc’s output to sort, separately. If you move the pipe to handle the complete output of the loop, it should work:

paths=`find ./src/ | egrep "\.(cpp|h)$"`
for path in $paths; do
wc -l $path
done | sort -n

You should avoid looping over find’s output; you also don’t need to use egrep to filter find’s output. You can process all the above with

find ./src/ \( -name '*.cpp' -o -name '*.h' \) -exec wc -l {} \; | sort -n

or more efficiently, if you don’t mind having a “total” line, with

find ./src/ \( -name '*.cpp' -o -name '*.h' \) -exec wc -l {} + | sort -n

(This still won’t quite work if your filenames include newlines.)

Stephen Kitt
  • 434,908
0

Your first loop sorts the single line output of each wc -l individually, and outputs that one after the other. Doesn't work (and that's expected!).

Your second approach first aggregates all lines from all wc calls, and then sorts them: that's the right way to go. Whether or not there's a file in between is not the problem here – the problem is that in your first loop you're not actually sorting anything.

So,

( for path in $paths; do
wc -l $path
done ) | sort -n

should work.

Your find call is strange in that it uses egrep to filter the output (which will lead to interesting results the way you do it with folders that end in .cpp, as you'll sometimes find them in e.g. CMake builds) instead of simply find -type f '(' -iname '*.cpp' -o -iname '*.h' ')'; however, I'd discourage you from using find here alltogether, because file names with spaces (extremely common), newlines etc will break all this, for no good reason.

Instead, use what your shell (which I guess is bash) gives you directly:

shopt -s nullglob ## don't fail on empty globs
shopt -s globstar

for path in /.{h,cpp} ; do wc -l "${path}" done | sort -n

We can make this even shorter, in fact:

shopt -s nullglob ## don't fail on empty globs
shopt -s globstar
wc -l **/*.{h,cpp} | sort -n
0

With GNU implementations of the find, wc and head utilities, assuming file paths don't contain newline characters:

{
  find . '(' -name '*.h' -o -name '*.cpp' ')' -print0
  printf '%s\0' /dev/null
} |
  wc -l --files0-from=- |
  head -n -2 | # remove up to 2 trailing lines to remove the /dev/null
               # and possibly "total" lines
  sort -n

Contrary to the -exec wc -l {} + approach, this one guarantees that only one "total" line is output.

We still have a problem that wc only outputs a total line if passed more than one file. Here, we work around it by adding an extra /dev/null which we remove at the end.