6

I'm trying to figure out how to get total number of lines from all .txt files. I think the problem is on the line 6 -> let $((total = total + count )). Anybody knows what's to correct form of this?

#!/bin/bash
total=0
find /home -type f -name "*.txt" | while read -r FILE; do
          count=$(grep -c ^ < "$FILE")
           echo "$FILE has $count lines"
           let $((total = total + count ))
        done
        echo TOTAL LINES COUNTED:  $total

Thank you

αғsнιη
  • 41,407

8 Answers8

20

Your line 6 is better written as

total=$(( total + count ))

... but it would be better still to use a tool that is made for counting lines (assuming you want to count newlines, i.e. the number of properly terminated lines)

find . -name '*.txt' -type f -exec cat {} + | wc -l

This finds all regular files in or below the current directory that have filenames ending in .txt. All these files are concatenated into a single stream and piped to wc -l, which outputs the total number of lines, which is what the title and text of the question asks for.

Complete script:

#!/bin/sh

nlines=$( find . -name '*.txt' -type f -exec cat {} + | wc -l )

printf 'Total number of lines: %d\n' "$nlines"

To also get the individual files' line count, consider

find . -name '*.txt' -type f -exec sh -c '
    wc -l "$@" |
    if [ "$#" -gt 1 ]; then
        sed "\$d"
    else
        cat
    fi' sh {} + |
awk '{ tot += $1 } END { printf "Total: %d\n", tot }; 1'

This calls wc -l on batches of files, outputting the line cound for each individual file. When wc -l is called with more than one filename, it will output a line at the end with the total count. We delete this line with sed if the in-line sh -c script is called with more than one filename argument.

The long list of line counts and file pathnames is then passed to awk, which simply adds the counts up (and passes the data through) and presents the user with the total count at the end.


On GNU systems, the wc tool can read pathnames from a nul-delimited stream. You can use that with find and its -print0 action on these systems like so:

find . -name '*.txt' -type f -print0 |
wc --files0-from=- -l

Here, the found pathnames are passed as a nul-delimited list over the pipe to wc using the non-standard -print0. The wc utility is used with the non-standard --files0-from option to read the list being passed across the pipe.

Kusalananda
  • 333,661
  • 2
    Why all this complexity? Wouldn't shopt -s globstar; wc -l **/*.txt be enough? – user000001 May 24 '21 at 14:00
  • -exec cat {} + the + is subtle but brilliant. had to read the manpage to figure out what it does and learned something new. thanks! – Andreas Grapentin May 24 '21 at 14:54
  • 2
    @user000001 Yes of course, if you can guarantee that **/*.txt expands to a list that is short enough to not trigger an "argument list too long " error. You can't do that in the general case, and the user in the question does not indicate the number of files involved. – Kusalananda May 24 '21 at 18:28
6
let $((total = total + count ))

This works, but is a bit redundant, since both let and $(( .. )) start arithmetic expansion.

Any of let "total = total + count", let "total += count", : $((total = total + count)) or total=$((total + count)) would do it without the duplication. The last two should be compatible with a standard shell, let isn't.

total=0
find /home -type f -name "*.txt" | while read -r FILE; do
    total=...
done
echo TOTAL LINES COUNTED:  $total

You didn't say what problem you mean, but one problem you have here is that in Bash, the parts of a pipeline run in subshells by default, so any changes made to total inside the while loop are not visible after it. See: Why is my variable local in one 'while read' loop, but not in another seemingly similar loop?

You could use shopt -s lastpipe to have the last part of the pipeline run in the shell; or group the while and echo:

find ... | { while ...
    done; echo "$total"; }

Of course, find ... | while read -r FILE; will have problems with filenames that contain newlines, or start/end with whitespace. You could fix that with

find ... -print0 | while IFS= read -r -d '' FILE; do ...

or, if you don't care about the breakdown of per-file line counts and know your files are complete text files, with none missing the final newline, you could simply concatenate all the files together and run wc -l on that.

If your files may be missing the newline at the end of the last line, and you want to count that final incomplete line, then you can't do that, and need to keep using grep -c ^ instead of wc -l. (Counting the final partial line is pretty much the only reason to use grep -c ^ instead of wc -l.)

See: What's the point in adding a new line to the end of a file? and Why should text files end with a newline? on SO.

Also, if you only want the total count, all the files matching the pattern are regular files (so the -type f test can be dropped), and you have Bash and GNU grep, you could also do:

shopt -s globstar
shopt -s dotglob
grep -h -c ^ **/*.txt | awk '{ a += $0 } END { print a }'

**/*.txt is a recursive glob, it needs to be explicitly enabled to work. dotglob makes that glob also match filenames starting with a dot. grep -h suppresses the filenames from the output, and the awk script counts the sum. Since no filenames are printed, this should work even if some of them a problematic.

Or, as suggested by @fra-san, based on another now-deleted answer:

grep -r -c -h --include='*.sh' ^ |awk '{ a+= $0 } END {print a }'
ilkkachu
  • 138,973
5

let total+=count will work, there's no need for $(( )) with this form of arithmetic evaluation.

But you'd be much better off doing this with wc -l.

find /home -type f -name '*.txt' -exec wc -l {} +

If you want custom output as in your shell script above, OR if there are likely to be more filenames than will fit in the ~2MB line-length limit of bash on linux, you could use awk or perl to do the counting. Anything is better than a shell while-read loop (see Why is using a shell loop to process text considered bad practice?). For example:

find /home -type f -name '*.txt' -exec perl -lne '
  $files{$ARGV}++;

END { foreach (sort keys %files) { printf "%s has %s lines\n", $, $files{$}; $total+=$files{$_} }; printf "TOTAL LINES COUNTED: %s\n", $total }' {} +

Note: the find ... -exec perl command above will ignore empty files, whereas the wc -l version would list them with a line count of 0. It's possible to make perl do the same (see below).

OTOH, it will do a line count and total for any number of files, even if they won't all fit in one shell command line - the wc -l version would print two or more total lines in that case - probably not going to happen, but not what you want if it did.

This should work, this uses wc -l and pipes the output into perl to change it to the desired output format:

$ find /home -type f -name '*.txt' -exec wc -l {} + |
    perl -lne 'next if m/^\s+\d+\s+total$/;
               s/\s+(\d+)\s+(.*)/$2 has $1 lines/;
               print;
               $total += $1;
           END { print &quot;TOTAL LINES COUNTED:  $total&quot;}'

cas
  • 78,579
  • 1
    What is the meaning of the + after {}? – Peter - Reinstate Monica May 24 '21 at 14:27
  • 1
    When using find's -exec option, the {} is a placeholder for the found files. Ending the -exec with a semi-colon ; (which has to be escaped as \; in the shell) makes find execute the command once for each filename. Ending it with + makes find try to fit as many filenames on the command as it can (current cmd line length limit on linux is about 2 million characters - a lot of filenames). i.e. the + makes find run wc -l file1 file2 file3 ...... fileN instead of wc -l file1 then wc -l file2 and so on until wc -l fileN. – cas May 24 '21 at 14:34
  • Ah, I see. (I had looked at the find manual page but searched for an unescaped + which does something different in less). – Peter - Reinstate Monica May 24 '21 at 14:41
2

Instead of reading every line, which isn't very optimal, use wc. Also correct arithmetic evaluation syntax: total=$((total+count)) would help.

#!/bin/bash

total=0 path=/home

for f in $(find $path -type f -name "*.txt"); do count=$(wc -l < $f) echo "$FILE has $count lines" total=$((total + count)) done echo TOTAL LINES COUNTED: $total

This doesn't work with filenames with spaces or new lines. Buyer beware.

tansy
  • 741
1

Try this:

#!/bin/bash
export total=$(find . -name '*.txt' -exec wc -l "{}" ";" | awk 'BEGIN{sum=0} {sum+=$1} END{print sum}')
echo TOTAL LINES COUNTED ${total}
Emmett
  • 62
0

If they are all in one directory then this works:

cat -- *.txt | wc -l

(note: it doesn't count hidden files such as .foo.txt unless you enabled the dotglob or globdots option of your shell)

Duncan
  • 1
0

Using Raku (formerly known as Perl_6)

Adapting the excellent (first) Perl5 answer from @cas here:

~$ find ~/find_dir -type f -name '*.txt' -exec raku -ne '
   BEGIN my %files; state $total;
     %files{$*ARGFILES}++;

END for (sort keys %files) { printf "%s has %s lines\n", $, %files{$}; $total+=%files{$_};

LAST printf "TOTAL LINES COUNTED: %s\n", $total }' {} +

Significant differences between the original Perl5 code and this Raku code include invariant sigils--specifically the hash %files never changes sigils. In Raku, files read off the command line are found in the $*ARGFILES dynamic variable, although for more complicated scripts the @*ARGS array can be used. Raku also has a series of Control Flow commands including BEGIN, END, and LAST which are put to good use here.

Starting anew in Raku, I would probably write something like the following, which takes advantage of Raku's dir(…) routine:

~$ raku -e '
   my  $total;

for dir("$*CWD/file_dir", test => /.txt$/ ) -> $name { my $lc = $name.lines.elems; say $name.absolute => $lc; $total += $lc; };

say "TOTAL LINES COUNTED: $total";'

Because Raku's dir(…) routine can test/filter by a literal string (e.g. test => ".txt"), OR a regex matcher (e.g. test => /\.txt$/ ), the programmer does not have to rely on shell-globbing to filter-in only the .txt files-of-interest.

The code above returns stringified IO::PATH => $lc pairs giving the number of lines as value. The absolute method call resolves paths. It's simple enough to append these pairs into a hash (if necessary) for further manipulations. As above, the total lines ($total) are also output in the last statement.

https://docs.raku.org/language/control.html
https://docs.raku.org/routine/dir
https://docs.raku.org/language/io-guide#Stringifying_IO::Path
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17
-1

Based on the code in your post, I'm guessing it might be from this post.

While this isn't the best way to do this, you can use the following instead:

shopt -s lastpipe
total=0
find pathhere -type f -name "*.txt" | while read FILE; do
     count=$(grep -c ^ < "$FILE")
     echo "$FILE has $count lines
     total=$((total + count))
done
echo TOTAL LINES COUNTED:  $total

or with wc:

shopt -s lastpipe
total=0
find pathhere -type f -name "*.txt" | while read FILE; do
     count=$(wc -l < "$FILE")
     echo "$FILE has $count lines"
     total=$((total + count))
done
echo TOTAL LINES COUNTED:  $total

You might have noticed the shopt -s lastpipe, and that's because the while loop is running in a subshell, and thus doesn't carry over the new value of the variable total at the end of the loop...unless you use this option at the top.

Or if you want something faster and shorter:

find /path/to/directory/ -type f -name "*.txt" -exec wc -l {} \; | awk '{total += $1} END{print total}'
  • You're welcome :) @OtaŠkvor – Nordine Lotfi May 23 '21 at 16:34
  • 3
    Hmm, don't the first two versions here (with the while loops) share the same problem as that in the question: that with Bash $total prints as 0 after the loop? – ilkkachu May 23 '21 at 18:21
  • Right, didn't notice the while loop was in a subshell, which wasn't keeping the result of the total variable...thanks for mentioning, will edit @ilkkachu – Nordine Lotfi May 23 '21 at 21:26