14

I've added a git alias to give me the line counts of specific files in my history:

[alias]
lines = !lc() { git ls-files -z ${1} | xargs -0 wc -l; }; lc

However, wc -l is reporting multiple totals, such that if I have more than ~100k lines, it reports the total for them, then moves on. Here's an example:

<100k lines (desired output)

$ git lines \*.xslt
  46 packages/NUnit-2.5.10.11092/doc/files/Summary.xslt
 232 packages/NUnit-2.5.10.11092/samples/csharp/_UpgradeReport_Files/UpgradeReport.xslt
 278 total

>100k lines (had to pipe to grep "total")

$ git lines \*.cs | grep "total"
 123569 total
 107700 total
 134796 total
 111411 total
  44600 total

How do I get a true total from wc -l, not a series of subtotals?

Ehryk
  • 1,852
  • According to http://stackoverflow.com/questions/2501402/why-does-the-wc-utility-generate-multiple-lines-with-total the problem is with xargs, not wc. I'm still interested in how to fix it, and I don't see a good solution in the answers. – Ehryk Jan 31 '14 at 19:49
  • 3
    Does your version of wc support the --files0-from option? Then you can do { git ls-files -z ${1} | wc -l --files0-from=- ; } – Mark Plotnick Jan 31 '14 at 20:40
  • @MarkPlotnick I think that deserves to be an answer. – terdon Jan 31 '14 at 20:59
  • Nope. wc: unrecognized option '--files0-from=-' – Ehryk Jan 31 '14 at 21:39

4 Answers4

13

Try this, and apologies for being obvious:

cat *.cs | wc -l

or, with git:

git ls-files -z ${1} | xargs -0 cat | wc -l

If you actually want the output to look like wc output, with both individual counts and a sum, you could use awk to add up the individual lines:

git ls-files -z ${1} | xargs -0 wc -l |
awk '/^[[:space:]]*[[:digit:]]+[[:space:]]+total$/{next}
     {total+=$1;print}
     END {print total,"total"}'

That won't be lined up as nicely as wc does it, in case that matters to you. To do that, you'd need to read the entire input and save it, computing the total, and then use the total to compute the field width before using that field width to print a formatted output of the remembered lines. Like home renovation projects, awk scripts are never ever really finished.

(Note to enthusiastic editors: the regular expression in the first awk condition is in case there is a file whose name starts with "total" and a space; otherwise, the condition could have been the much simpler $2 == "total".)

rici
  • 9,770
  • That does work, but it outputs the total only (git ls-files -z ${1} | xargs -0 cat | wc -l). However, I'm missing the per-file line count that wc -l provides like in my first example above. Any way to get the best of both worlds here? – Ehryk Jan 31 '14 at 21:41
  • Or, if that's too difficult, how about a switch such that if it would break it up: just give the total, if it would not, give the normal wc per-file with a total output? – Ehryk Jan 31 '14 at 22:01
  • @Ehryk: you could just do it twice, once the way you were doing it with grep -v to drop the total lines, and once the way I suggest to get the total total. Or you could try the awk solution in the edited answer, – rici Feb 01 '14 at 01:38
  • +1: "Like home renovation projects, awk scripts are never ever really finished." – Ehryk Feb 01 '14 at 07:57
  • That worked like a charm. My final result: git ls-files -z ${1} | xargs -0 wc -l | awk '/^[[:space:]]*[[:digit:]]+[[:space:]]+total$/{next} {total+=$1;print} END {print "\n Total:",total,"lines"}' – Ehryk Feb 01 '14 at 07:59
  • I've marked yours as the answer, but just out of curiosity; is there any way to group the output of ls-files into the subdirectories from the current directory, so that wc -l returns the total for a given file type for each subdirectory (no recursion), then the total total? – Ehryk Feb 01 '14 at 08:01
  • @ehryk: There is always a way of doing things if you can precisely define what the thing you want to do is, which is why awk scripts are never ever really finished. One important fact to help you with grouping: / is not a legal character in a filepath segment, so the last / in a path is always the end of the directory part. There is never any need to interpret quotes or escape characters. – rici Feb 02 '14 at 04:34
  • I don't really know where do begin. What I'd want to have would be: Input: git lines \*.cs Output 481 / \n 234 Directory1/ \n 5212 Directory2/ \n Total: 5927 lines – Ehryk Feb 03 '14 at 07:55
7

If you're running Linux, your wc probably comes from GNU Coreutils and has a --files0-from option to read a file (or stdin) containing an arbitrarily long list of NUL-terminated names of file to count. The GNU Coreutils wc documentation says "This is useful when the list of file names is so long that it may exceed a command line length limitation. In such cases, running wc via xargs is undesirable because it splits the list into pieces and makes wc print a total for each sublist rather than for the entire list."

So try this:

lc() { git ls-files -z ${1} | wc -l --files0-from=- ; } 

Edit: Since your wc is from the last millennium and doesn't have that option, here is a more portable solution, assuming you have awk and do not have any files named "total". It will filter the output of wc, omitting any total lines and instead summing them up and printing out the grand total at the end.

One thing I do not know is whether the git alias implementation will have problems with the $1 and $2 inside single quotes, which need to be passed unchanged to awk.

lc() {
  git ls-files -z ${1} |
  xargs -0 wc -l |
  awk 'BEGIN { total=0; } { if (NF==2 && $2 == "total") total += $1; else print; } END { print total, "total"; }' ;
}
Mark Plotnick
  • 25,413
  • 3
  • 64
  • 82
  • I am not running linux, it's in the git bash prompt of Git for Windows http://msysgit.github.io/ (msysgit). – Ehryk Jan 31 '14 at 21:43
  • OK. So the xargs and wc you're running are from Cygwin? Can you paste the output of wc --version ? – Mark Plotnick Jan 31 '14 at 21:50
  • They're not from a full cygwin install: `$ wc --version wc (GNU textutils) 2.0 Written by Paul Rubin and David MacKenzie.

    Copyright (C) 1999 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.`

    – Ehryk Jan 31 '14 at 21:59
  • It's a full on windows executable, C:\Program Files (x86)\Git\bin\wc.exe – Ehryk Jan 31 '14 at 22:00
  • @Ehryk Msysgit is a port of the Linux tools, but it tends to have old versions, so it may not have --files0-from. – Gilles 'SO- stop being evil' Jan 31 '14 at 23:47
4

The problem is xargs which is splitting the command into multiple runs, so wc is reporting the total for each time. You have a few options, you could keep things the way they are and parse the wc output:

git ls-files -z ${1} | xargs -0 wc -l | awk '/total/{k+=$1}END{print k,"total"}';

You could cat the files:

git ls-files -z ${1} | xargs -0 cat | wc -l

Or you could skip xargs altogether (adapted from here):

unset files i; while IFS= read -r -d $'\0' name; do 
 files[i++]="$name"; 
done < <(git ls-files -z ${1} ) && wc -l "${files[@]}"

That will break if your list of files is longer than ARG_MAX though.

terdon
  • 242,166
-1
j=0; for i in *.php *.js *.css; do let j+=`wc -l $i | awk {'print $1'}`; done; echo $j;