Script to count files matching a pattern in subdirectories

Question

I wrote the following script for finding the number of pdf and tex files from the current directory, including the subdirectories and hidden files. The following code is able to find the number of pdf files upto 2 levels of subdirectories below, but after that it tells that there are no sub directories....

#!/bin/bash

touch t.txt

k=`find -type d |wc -l`
k1=`expr $k - 1`

echo $k1

message1="*.pdf *.tex"
count=`ls -al $message1|wc -l`
find -type d > t.txt

i=2

while [ $i -le $k ]
do
    kd=`head -$i t.txt|tail -1`
    echo $kd
    touch $kd/t.txt
    cp t.txt $kd/t.txt
    i=`expr $i + 1`
done

i=2
while [ $i -le $k ]
do
    nd=`head -$i t.txt|tail -1`
    set -x
    echo $nd
    set +x
    cd $nd
    j=`ls -al $message1|wc -l`
    count=`expr $count + $j`
    i=`expr $i + 1`
done
#set +x

echo $count

Chris Down · Answer 1 · 2021-04-14T15:19:09.527

11

You can do this in pure bash:

shopt -s nullglob dotglob globstar
set -- **/*.pdf **/*.tex
echo "$#"

set sets the positional parameters of the current shell to the result of the glob. $# then retrieves the number of these parameters set.

If you do use the positional parameters (unlike in the script in the inquirer's case), then you can do the same using an array:

shopt -s nullglob dotglob globstar
files=(**/*.pdf **/*.tex)
echo "${#files[@]}"

edited Apr 14 '21 at 15:19

answered Dec 22 '11 at 08:38

Chris Down

125,559
25
270
266

Is it relevant to compare the speed of a recursive traversal of a directory branch and a lookup in the current directory? – manatwork Dec 22 '11 at 09:14
@manatwork Didn't realise it was also subdirs, fixing now. – Chris Down Dec 22 '11 at 09:17
Ok, just one more word: wow! – manatwork Dec 22 '11 at 09:25
1

@ChrisDown: Could you explain this set and $#? – pbm Mar 23 '12 at 15:46
1

what is this dark magic?!? – d-_-b Dec 12 '20 at 02:01
Or, in a shell that doesn't have arrays, you can do num=$(set -- **/*.pdf **/*.tex; echo "$#"). – G-Man Says 'Reinstate Monica' Apr 15 '21 at 01:19
@G-ManSays'ReinstateMonica' On any shell so basic that it doesn't have arrays, that's not going to work, since it needs equivalents of the globstar, dotglob, and nullglob features to function, and those are almost certainly nonexistent. – Chris Down Apr 15 '21 at 15:33

kev · Accepted Answer · 2012-03-23T12:51:53.813

8

find works fine to me:

$ find . -name '*.pdf' -o -name '*.tex' | wc -l
75
$ find . -name '*.pdf' | wc -l
16
$ find . -name '*.tex' | wc -l
59
$ echo $((16+59))
75

Edit:
To handle special case: newline in filename

$ find . \( -name '*.pdf' -o -name '*.tex' \) -printf x | wc -c

edited Mar 23 '12 at 12:51

answered Dec 22 '11 at 06:09

kev

966

This will break for files with newlines in their filename. – Chris Down Dec 22 '11 at 08:36
It does break, as you can quite clearly see executing the following code: > $'foo\nbar.pdf' ; > $'baz\nqux.tex' ; find . -name '*.pdf' -o -name '*.tex' | wc -l -- the reply is 4, which is not correct (there are two files). – Chris Down Dec 22 '11 at 08:47
@ChrisDown. You are right. – kev Dec 22 '11 at 08:54
1

@ChrisDown: I am always reluctant to make the code more complex only to take into account "newlines in filenames", because I have never seen such a case in everyday situations. Obviously, for code to release to the public, it is correct to take into account every possibility. Are you aware of cases where "newlines in filenames" are not create by mistake or deliberately to test a software? – enzotib Dec 22 '11 at 09:15
@enzotib I've seen it multiple times, but only by people using graphical file managers. Often it happens when they go to paste something from another source that contains newlines into a filename, and they don't expect the newlines to still be present. – Chris Down Dec 22 '11 at 09:16
@kev: But, say i have a hidden file names as '.pdf'(it has no extension), the 'find . -name '*.pdf' | wc -l' counts that also... the script also takes that into account.And also the 'ls -al' is not showing the hidden files sometimes – user13522 Dec 22 '11 at 14:03
@user13522. Just another special case: -name '*?.pdf' – kev Dec 22 '11 at 14:08
@kev: great...also solved the problem with the script I had already posted... And, one doubt: 'ls -l .pdf 'shows pdf files only, but why 'ls -al .pdf' is showing the same files (it is not showing the hidden pdf files) And, what does the -name attribute of the find command do? – user13522 Dec 23 '11 at 05:07
Instead of counting the filenames, how about counting characters? You don't need the names anyway: find . $ -name '*.pdf' -o -name '*.tex' $ -printf x | wc -c – l0b0 Jan 03 '12 at 11:38

score 0 · Answer 3 · answered Mar 23 '12 at 13:30

I would recommend (if available) using locate instead of find. You would be querying a database and results would be instant and there is practically no load on the system. Though the database only gets updated when your system runs updatedb so if you wanted up to the second information you would have to make sure that you ran it first and it would put a load on the system but, it depends on how you intend to use your search.

You could use whatever regex meets your needs.

system1:/unix.stackexchange # locate *.tex *.pdf | grep unix.stack.*
   /unix.stackexchange/access_me/1/file.pdf
   /unix.stackexchange/access_me/1/file.tex

Script to count files matching a pattern in subdirectories

3 Answers3

Linked