Sum in bash outside while read line

Question

I'm trying to come up with the sum of lines in .js files in a folder. I'm using this in bash:

sum=0 && find . | grep ".js" | while read -r f; do wc -l $f | awk '{print $1;}'; done;

putting the $sum += $1 inside the awk does not work. how am I supposed to do this?

P.S: I'm aware this can be much easier achieve using

find . -name '*.js' | xargs wc -l

I still want the solution to above.

See why-is-using-a-shell-loop-to-process-text-considered-bad-practice. There are many other issues with your script too including testing the result of an assignment, unnecessary grep, not escaping the metachar in the grep, not anchoring it, not quoting variables, using the wrong quotes. You're asking for help to implement an approach you should never take. — Ed Morton, Oct 04 '19 at 16:45
"putting the $sum += $1 inside the awk does not work." -- What exactly did you put and where? What happened? What did you expect to happen? Please edit your question to show exactly what you tried to do. — ilkkachu, Oct 05 '19 at 08:23

pLumo · Answer 1 · 2019-10-04T12:09:42.443

11

Try this easy and super fast solution:

find . -type f -name "*.js" -exec cat {} + | wc -l

I tried some solutions with wc before, but they will have issues with e.g. newline in file names and/or are slow.

edited Oct 04 '19 at 12:09

answered Oct 04 '19 at 11:54

pLumo

22,565

I noticed that it is not really what OP is looking for regarding his last sentence (PS)... but I like the solution so much, I'll leave it here ;-) – pLumo Oct 04 '19 at 12:15
Or given all filenames fit in one exec, just -exec awk 'END{print NR}' {} + (one difference: includes any unterminated final lines of files, which -exec cat {} + | wc -l doesn't) – dave_thompson_085 Oct 05 '19 at 02:48

glenn jackman · Answer 2 · 2019-10-04T20:50:04.470

5

bash executes each command of a pipeline in a separate subshell, unless you enable the lastpipe shell option

# bash requires job control to be disabled for lastpipe setting
set +m
shopt -s lastpipe

declare -i sum=0
find . -name '*.js' -print0 | while IFS= read -d '' -r name; do
    (( sum += $(wc -l < "$name") ))   # redirect the file into wc for easier output
done
echo $sum

Process substitutions are handy for dealing with this subshell problem:

declare -i sum=0
while IFS= read -d '' -r name; do
    (( sum += $(wc -l < "$name") ))   # redirect the file into wc for easier output
done < <(
    find . -name '*.js' -print0
)
echo $sum

However, this makes the program flow harder to read.

edited Oct 04 '19 at 20:50

answered Oct 04 '19 at 11:57

glenn jackman

85,964

anyway to use this in a single line in bash? – d9ngle Oct 04 '19 at 12:15
3

Add some semicolons and remove newlines. Why does it need to be one line? – glenn jackman Oct 04 '19 at 13:24
1

Remember the IFS= or it'll fail for file names that start with spaces. Yes, I know it's unlikely but IFS= is something we should all use by default and remove when we need to (like quotes around variables) to avoid any surprises. – Ed Morton Oct 04 '19 at 16:51
1

Optionally, omit lastpipe and redirect the whiles input? – D. Ben Knoble Oct 04 '19 at 20:22

markgraf · Answer 3 · 2019-10-06T08:58:00.177

5

You want awk to do the addition and show the result?

awk '{sum +=$1} END {print sum}' should do the trick.

In my library for bash-scripts I do:

$ find . -type f -name '*.bash' \
| while read -r f ; do wc -l "$f" ; done \
| awk '{sum +=$1} END {print sum}'

and get the result 522

edited Oct 06 '19 at 08:58

answered Oct 04 '19 at 13:54

markgraf

2,860

like this? sum=0 && find . | grep ".js" | while read -r f; do wc -l $f | awk '{sum +=$1} END {print sum}'; done; doesn't work – d9ngle Oct 05 '19 at 03:50
1

The awk-command goes outside the loop. Updated my answer to give an example. – markgraf Oct 06 '19 at 08:58

score 1 · Answer 4 · answered Oct 04 '19 at 22:49

Abstract

To count lines in a directory:

shopt -s globstar;                      # valid for bash
set -- ./**/*".js"; cat "$@" | wc -l    # for files under `./` directory

To sum outside a while read loop

shopt -s globstar;                                 # valid for bash
set -- ./**/*".js"                                 # for files under `./` directory
wc -l "$@" | awk '{sum+=$1} END {print sum-=$1}'   # calculate the sum in awk

But why would you re-calculate a sum if wc -l prints a total on the last line? :

wc -l "$@" | tail -n 1

Detail

There are several elements that may be improved:

The part of | awk '{print $1;}' to select only the first field is not necesary if you execute wc -l <"$f" instead of wc -l $f. The simple redirection (<) makes wc receive the file in its standard input and it will have no filename to print. This would reduce the script to:

find . | grep ".js" | while read -r f; do wc -l <"$f"; done
There is no need for a grep call if find does the selection:

find . -name '*.js' | while read -r f; do wc -l <"$f"; done
A read will remove leading and trailing blank spaces from file names.
And actually, find could execute the command for each file (implicit loop):

find . -name '*.js' -exec sh -c 'wc -l <"$1"' foo '{}' \;
And it is even possible to make one single global call to wc instead of one per file.

find . -name '*.js' -exec sh -c 'cat "$@" | wc -l' foo '{}' +

But the need to re-call the shell to process each filename without any issue with spaces, tabs, newlines or glob characters (*,?,[) indicates that we may solve this directly in the shell if we do not need some find's special resolution of links.

set -- *.js; cat "$@" | wc -l # for the present directory

Or

shopt -s globstar;                      # valid for bash
set -- ./**/*".js"; cat "$@" | wc -l    # for files under `./` directory

Sum outside a while read loop

The question in the title regards this part of your pipe:

while read -r f; do wc -l $f …

Assuming the list of files is in the argument list ($@) (or it could be inside some array as well) as found above, this will print a list of files with the line count as first field:

$ printf '%s\n' "$@" | while read -r f; do wc -l "$f"; done
12 filea.js
21 fileb.js

At this point you could just add a new pipe with awk to select the first field:

$ printf '%s\n' "$@" | while read -r f; do wc -l "$f"; done | awk '{print $1}'
12
21

But you might as well print all in one line with a + appended:

$ printf '%s\n' "$@" | 
> while read -r f; do wc -l "$f"; done | 
> awk '{printf( "%s+",$1)}'
12+21+

And, adding a trailing 0, make bc sum it all:

$ printf '%s\n' "$@" | 
> while read -r f; do wc -l "$f"; done |
> awk '{printf("%s+",$1)}END{print 0}' |
> bc
33

But, as already said, you can avoid the printing of filename with wc -l <"$f"`` and you can convert the newlines to+, then add a0` and make bc do the calculation:

$ printf '%s\n' "$@" |
  while read -r f; do wc -l <"$f"; done |
  { tr '\n' '+'; echo 0; } |
  bc

33

or make awk calculate the sum:

$ printf '%s\n' "$@" | 
  while read -r f; do wc -l <"$f"; done | 
  awk '{sum+=$1} END {print sum}'

33

Adam D. · Answer 5 · 2019-10-04T12:45:19.850

Here is another bash way to do this:

sum=0 && while IFS= read -r -d '' f; do let sum+=$(sed -n "\$=" "${f}"); done < <(find . -name '*.js' -print0 2>/dev/random) && echo "$sum"

find ... -print0 -- prints with null terminated 
            IFS= -- deset IFS 
   read -r -d '' -- -r do not allow backslash escapes, -d '' set delimiter to NULL
   let sum+=(..) -- add up sum of sed $= "$f", $: last line, =: line #