Count all files matching a glob in the current directory and all its subdirectories using BASH scripting

Question

I am trying to count all the files matching a specific glob in the current directory and all its subdirectories. An example of this could be finding all files ending with ".txt".

(I must use for loop to match all files in the current directory and another for loop to go through all the subdirectories of the current directory)

#!/bin/bash
myglob="$1"
if [ $# -eq 1 ]; then
        dir=$1
else
        echo -n Please enter an ending file name:
        read -r  myglob
fi
# echo Directory $dir
numDir=0
numFile=0
for file in ./*; do
if [ -d "$file" ]; then
echo $file is a FIRST directory
let numDir=numDir+1
    if [[ &quot;$file&quot; == *&quot;$myglob&quot; ]]; then
            echo $file is a FIRST file
            let numFile++
    fi
    for file in ./*/*; do
            if [[ &quot;$file&quot; == *&quot;$myglob&quot; ]]; then
                    echo $file is a SECOND file
                    let numFile++
            fi
    done

done
#echo "$dir" contains "$numDir" directories
echo "$dir" contains "$numFile" files

Is this also coursework? Not particularly a problem if so, but it's always helpful to know — Chris Davies, May 09 '21 at 22:36
Does this answer your question? Script to count files matching a pattern in subdirectories — Freddy, May 09 '21 at 22:40
@Freddy it doesn't help me that, I need to use for loop not while or whatsoever. — The_Liner, May 09 '21 at 22:46
@The_Liner quite possibly, but not tonight as it's just about midnight here and I need some sleep :-) — Chris Davies, May 09 '21 at 22:47
So, it's not only coursework you want us to do for you, it's coursework with a deadline, that is a test. — waltinator, May 09 '21 at 23:36
@waltinator I don't need you to do it for me, I just need help. How do you check if files match a glob, for example ending with .txt? — The_Liner, May 09 '21 at 23:39
.Always paste your script into https://shellcheck.net, a syntax checker, or install shellcheck locally. Make using shellcheck part of your development process. — waltinator, May 09 '21 at 23:40
I have used that, but the problem is I do not know how do I check if files match a glob? — The_Liner, May 09 '21 at 23:41

cas · Answer 1 · 2021-05-12T06:43:31.367

You seem to be mis-reading the assignment's question.

it says "current directory", which is ., not ~ or ~/linux2/q3
it also says "and all subdirectories". Given that this appears to be an introductory shell-scripting course, it's extremely unlikely that they expect you to write your own code, in bash, to recurse subdirectories. That is not a task for beginners.

It almost certainly means "use find, the standard tool for recursing subdirectories".
It says to use a glob, not to implement your own filename pattern matching. No matter how well you write your own pattern matching code, it's NOT using a glob.

find has a -name option which uses globs to match files.

Note that it also doesn't say "matching a file ending" or file extension. It says "matching a specific glob" and gives ".txt" as an example. A glob can match file extensions, but it can also be used to match a lot more than just that.
"write a shell script to do X" (or words like that) does not necessarily mean "write a shell script that doesn't use any external programs, using only built-in commands". In fact, it certainly does not mean that unless it is explicitly stated.

Calling external programs to do work is what shell scripts do, it's completely normal and expected for shell scripts...especially when using any of the standard unix utilities, like find or wc.

wc is a standard program which can be used to count the number of characters, lines, and/or words in a file or stdin. In this case, you only want to count the number of lines, so use wc's -l option.

#!/bin/bash

# Count the number of files matching a glob in the current directory
# and all subdirectories.
#
# The glob can be specified on the command line, in which case it
# MUST be quoted or escaped to prevent the shell from expanding it.
# e.g. use '*.txt' or \*.txt, not just *.txt.
#
# if the glob is not specified on the command line, the script prompts
# for a glob until one is provided.

myglob="$1"

while [ -z "$myglob" ] ; do
  read -p 'Enter a glob: ' myglob
done

numfiles=$(find . -type f -name "$myglob" | wc -l)
echo $numfiles

If there is any chance that any of the filenames in the current directory have newlines (i.e. LF characters) in them (which is a valid character in unix filenames), then use NUL as the filename separator instead of LF:

numfiles=$(find . -type f -name "$myglob" -print0 |
             awk -v RS='\0' '{count++}; END {print count}')

Instead of using wc -l, this uses an awk script to count the NUL-separated filenames.

Or, as Stéphane Chazelas pointed out in a comment, you can do this with just find and grep:

numfiles=$(find .//. -type f -name "$myglob" | grep -c //)

The .//. starting-directory argument causes find to output filenames prefixed with .//. Since it's impossible for // to appear in a filename from find, you can use grep -c // to count the files. The .// only appears in a filename once, so this works whether there are newlines in the filename or not.

BTW, it is good shell programming practice to always account for the possibility of newlines and other problematic characters (e.g. spaces, tabs, semi-colons, ampersands, etc) in filenames, even when you think it's probably not going to be an issue. It's one of the reasons why you should always double-quote your variables when you use them. And the reason why using NUL as a filename separator is better, more reliable, and safer than just using LF.

If you explain the reasoning behind using NUL as the separator instead of newline, that's probably worth extra marks.

Update

Even if you are required to use two for loops rather than find, you still shouldn't do your own pattern matching. Your code is not using globs to match files - it's using your own custom pattern matching code. That's not the same thing, not even close.

Here's an example using two for loops that actually uses globs to count matching files. I've added notes under each loop to explain them, but in a script you'd just run one loop after the other.

Loop 1 for current directory:

for f in $myglob; do
  [ -f "$f" ] && let numFile++
done

This for loop is an example of one of the very few instances where you don't want to quote $myglob when you use it because you want the shell to expand the glob.

In almost all other cases, you do not want the shell to expand variables on a command line, so you must enclose them in double-quotes: "$myglob" rather than just $myglob. Also, while not relevant for this script, you should still double-quote array variables like "${array[@]}" even when you want them to be expanded, because you want each individual element of the array to be treated as one "word".

Anyway, this uses [ -f "$f" ] to test if "$f" exists and is a regular file, so that it only counts files, not directories (or anything else, like symlinks or named pipes aka fifos). This does the same thing as using find's -type f option.

If you wanted to count directories in ./ instead of (or as well as) files, you would use:

[ -d "$f" ] && let numDir++

Loop 2 for immediate subdirectories:

for f in */$myglob ; do
  [ -f "$f" ] && let numFile++
done

This is almost identical to the first for loop, except it's iterating over */$myglob instead of just $myglob.

All together, that's something like:

#!/bin/bash
# comments deleted, same as version using find above.
myglob="$1"
while [ -z "$myglob" ] ; do
  read -p 'Enter a glob: ' myglob
done
for f in $myglob; do
  [ -f "$f" ] && let numFile++
done
for f in */$myglob ; do
  [ -f "$f" ] && let numFile++
done
echo "$(pwd)/ and $(pwd)/*/ combined contain $numFile files matching '$myglob'"

Unlike the find version, these loops will only count files in the current directory and directories immediately below it. It won't recurse any deeper into sub-subdirectories, etc.

This is probably what you want, as far as I can tell from reading your question.

You can limit the recursion depth in find using the -maxdepth option. e.g. find . -maxdepth 2 -type f -name "$myglob".

You could do find ... -printf . | wc -c to count find result. Then you have no issues with newline and don't need --files0-from=- — pLumo, May 10 '21 at 05:33
yeah, but why tell a beginner to use an obscure trick when wc -l is obvious, standard, and easily understood. — cas, May 10 '21 at 05:36
It's a trick, but imo not obscure. For me --files0-from=- is the more obscure trick, and I would have to look it up every time. But sure, it's a matter of choice :-) Your version is also valid. — pLumo, May 10 '21 at 05:36
also, printf is as non-standard for find as --files0-from is for wc. It's still a GNU-ism. — cas, May 10 '21 at 05:38
BTW, I have NFI why they didn't just use -0 or -z for wc, like every other program with an option for NUL-separated input. Mind-boggling, but that's what they chose. — cas, May 10 '21 at 05:43
@The_Liner extra bonus points: it is possible to count how many command-line arguments were provided. You could use this to print an error message & exit if it's not zero or exactly one. Or you could use it in a loop to handle more than one argument. I'll leave these things for you to do if you want, because they're good learning exercises. — cas, May 10 '21 at 06:02
@pLumo actually, you were right. --files0-from=- expects a list of file names, which wc then processes the same as if those filenames were on the command line (to avoid using xargs which could mess up the total if there are more filenames than would fit in one command line). It is not the same as -0 or -z. GNU wc didn't bother implementing that....I'll use awk instead. — cas, May 10 '21 at 06:19
Hi, thanks for trying to help me, but I must use two for loops, one to find matching files in the current directory and another to iterate in the subdirectories as is shown in my code. — The_Liner, May 10 '21 at 13:04
Even if required to use two for loops, you still shouldn't do your own pattern matching. They're not globs - they're your own custom pattern matching code. e.g. two for loops, one for current directory: for f in $myglob; do [ -f "$f" ] && let numFile++ ; done, and one for immediate subdirectories: for f in */$myglob ; do [ -f "$f" ] && let numFile++ ; done — cas, May 10 '21 at 14:19
See also find .//. ... | grep -c // to count file paths instead of lines in file paths. — Stéphane Chazelas, May 10 '21 at 14:23
@StéphaneChazelas nice. i like that better than find ... -printf . | wc -c. — cas, May 10 '21 at 14:36
I don't understand why I have this error: temp.sh: line 7: syntax error near unexpected token $'do\r'' 'emp.sh: line 7:for f in $myglob; do — The_Liner, May 10 '21 at 16:10
@The_Liner \r is a carriage-return character (CR). Unix uses line-feeds (LF, aka NL aka newline), \n only as end-of-line in a text file. Windows uses CR followed by LF. My guess is you edited the file on a Windows machine. You can convert using dos2unix or similar programs or with perl -i -p -e 's/\r\n/\n/' filename if you don't have that installed. or edit with vim and run :set fileformat=unix and :x to save the file and exit. — cas, May 11 '21 at 05:21

Kusalananda · Answer 2 · 2021-05-10T07:43:29.003

The way to expand the *.txt in the current directory and count the number of names that matched is to do

set -- ./*.txt

This sets the positional parameters ($1, $2, etc.) to the names matching the globbing pattern. If the nullglob shell option is set in the bash shell, this would be an empty list if there are no matches, otherwise the list would contain the unexpanded pattern itself. If the dotglob shell option is set in the bash shell, the list would include hidden names as well, if there are any that matches the pattern (* does not otherwise match hidden names).

The length of the list of positional parameters is $#.

This means that the following is a short bash script that counts and reports how many (possibly hidden) names matches *.txt in the current directory.

#!/bin/bash
shopt -s dotglob nullglob
set -- ./*.txt
printf 'There are %d names matching ./*.txt here\n' "$#"

If we enable the globstar shell option, we gain access to **, which matches down into subdirectories. We could then easily extend our script above to search recursively under the current directory and below:

#!/bin/bash
shopt -s dotglob nullglob globstar
set -- ./*/.txt
printf 'There are %d names matching ./*/.txt here\n' "$#"

You could obviously store the matching names in a named array if you so wish:

#!/bin/bash
shopt -s dotglob nullglob globstar
names=( ./*/.txt )
printf 'There are %d names matching ./*/.txt here\n' "${#names[@]}"

Would you want to print the matching names in a single column, you may do so with

printf '%s\n' "$@"

or, if you are using a named array in bash,

printf '%s\n' "${names[@]}"

Would you need to count only regular files, then you obviously need to iterate over the names matching the glob:

#!/bin/bash
shopt -s nullglob dotglob globstar
regular_files=()
for pathname in ./*/.txt; do
    if [ -f "$pathname" ] && [ ! -L "$pathname" ]; then
        regular_files+=( "$pathname" )
    fi
done
printf 'There are %d regular files matching ./*/.txt\n' "${#regular_files[@]}"

The -L test used above is true if the given pathname is a symbolic link, so the combination of tests used here ensures that we only count actual regular files, and no symbolic link to a regular file.

Beware that even with bash 5.1, if the current directory contains symlinks to directories, it will also count the text files in the directories they point to (not recursively, and not in targets of symlinks at deeper levels...). Would be better to zsh (where recursive globbing comes from and doesn't have this kind of issue and has glob qualifiers and where you can also disable sorting and which hash much more advanced globs). () { echo there are $# files; } **/*.txt(ND.oN) — Stéphane Chazelas, May 10 '21 at 07:59

Count all files matching a glob in the current directory and all its subdirectories using BASH scripting

if [ -d "$file" ]; then

echo $file is a FIRST directory

let numDir=numDir+1

2 Answers2

Update