Reporting number of files in Subdirectories, Bash

Question

I'm working on a Win10 computer, but I usually work on Gitbash or in the linux subsystem.

I'm trying to get the number of files in all subdirectories of a specified directory.

This is a similar question to How to report number of files in all subdirectories? But the difference is that I do not have a constant number of levels on all subdirectories, I have something like:

Dir1/sub1
Dir1/sub1/subsub1
Dir1/sub2
Dir1/sub3/subsub3/subsubsub3

I tried

 shopt -s dotglob; for dir in */; do all=("$dir"/*); echo "$dir: ${#all[@]}"; done

playing around with the number of levels to search in (* /,* /* /* and so on)

But I cannot really get what Im looking for, something like:

Dir1/sub1: Number of files
Dir1/sub2: Number of files
Dir1/sub3: Number of files

You'd want a report of files in each of the directories sub1, sub1/subsub1, sub2, sub3, sub3/subsub3, and subsubsub3? Or just for sub1, sub2, and sub3? If this second option, should sub1 and sub3 count files in their subdirectories too? — Chris Davies, Feb 13 '19 at 22:48

Kusalananda · Answer 1 · 2019-02-25T13:29:22.690

#!/bin/bash

shopt -s dotglob nullglob

topdir='./Dir1'

for subdir in "$topdir"/*/; do
    find "$subdir" -type f -exec echo . \; |
    printf '%s: %d\n' "${subdir%/}" "$( wc -l )"
done

This small bash script would output a list of pathnames of subdirectories of $topdir followed by the number of regular files found (anywhere) under each of those subdirectories.

The script loops over all subdirectories of $topdir and for each, it runs the find command

find "$subdir" -type f -exec echo . \;

This outputs a dot on an otherwise empty line for each found regular file under $subdir. We output a dot because these are easy to count (filenames can contain newline characters).

The dots are piped to

printf '%s: %d\n' "${subdir%/}" "$( wc -l )"

Here, printf is used to format the output. It takes the subdirectory path (with the final slash removed) and the count of files.

The count of files is had from wc -l which will count the dots coming over the pipe from find (strictly speaking, it does not count the dots but the newlines). Since printf itself is not reading its standard input stream, this is instead consumed by wc -l.

Setting the nullglob and dotglob shell options at the start allows us to skip the whole loop if there are no subdirectories under $topdir (that's with nullglob) and also to include hidden directory names under $topdir (that's with dotglob).

By changing

topdir='./Dir1'

into

topdir=$1

you can get the script to take a directory path as its only command line argument.

You may speed the find up radically by changing it into the slightly more complex

find "$subdir" -type f -exec sh -c 'for pathname do echo .; done' sh {} +

(the rest of the loop should be left as it is). This runs a really small in-line shell script for batches of found files, instead of echo for each file. This would be much quicker assuming echo is a built-in command in the sh shell. (You may want to change sh -c to bash -c to be sure of that.) When -exec echo . \; is used, find would execute /bin/echo, which would be slow to do for each file.

This is great! Would there be a reasonable way to sort the results? — lowcrawler, Apr 03 '22 at 21:57
@lowcrawler If you make sure that the data is nul-terminated rather than terminated by newlines and that whatever you're sorting by is at the start of each record, then it should be reasonably easy to sort using any sort implementation that can handle nul-terminated fields (like GNU sort). The only issue is that Unix filenames may consist of tabs, spaces, and newlines, hence the need to use nul-terminated records. — Kusalananda, Apr 04 '22 at 05:48

score 2 · Answer 2 · answered Feb 25 '19 at 14:22

With GNU utilities:

find Dir1 -mindepth 2 -type f -printf '%P\0' |
  awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) print i ": " n[i]}'

Counting only regular files for each of the subdirectories of Dir1.

Outputs something like:

sub1: 3
sub2: 30
sub3: 13
sub4: 3
sub5: 3

Adrian · Answer 3 · 2019-02-13T18:01:43.193

1

I'm not familiar with Gitbash on Windows, but I'll assume that whatever platform you're running this script on, you have these installed:

bash v4.x or higher (macOS users will need to install a more recent version via Homebrew or something)
GNU find--really, any standard Unix find will do, just not the MS-DOS/Windows version (which is more like grep)

Assuming the above, this script should do the trick:

#!/bin/bash
# USAGE: count_files <dir> ...

declare -A filecount

# Tell bash to execute the last pipeline element in this shell, not a subshell
shopt -s lastpipe

# Run through all the user-supplied directories at one go
for d in "$@"; do
  find "$d" -type f | while read f; do
    [[ $f =~ ^(${d%%/}/[^/]+)/ ]] && (( filecount["${BASH_REMATCH[1]}"]++ ))
  done
done

# REPORT!
for k in "${!filecount[@]}"; do
  echo "$k: ${filecount[$k]}"
done

edited Feb 13 '19 at 18:01

answered Feb 13 '19 at 16:32

Adrian

2,526
1
11
10

I should really learn how to use Bash properly, this did the job very nicely, how do I cite you for showing me the script? – Faustino Delgado Feb 13 '19 at 17:03
@FaustinoDelgado Just point back to this answer. The permalink can be found by clicking on the "share" link at the bottom of the answer. – Adrian Feb 13 '19 at 17:06
1

Does this work on directories given with absolute pathnames? What about pathnames containing newlines? It also seems to count the number of files in the given directories, not in their subdirectories, as asked for in the question, but that may just be me not understanding your code. Care to describe what you're doing? – Kusalananda Feb 13 '19 at 17:23
@Kusalananda How can you introduce new lines in path names?, I do not have that problem fortunately, yet. – Faustino Delgado Feb 13 '19 at 17:39
@FaustinoDelgado touch $'my\nfile' – Kusalananda Feb 13 '19 at 17:41
@Kusalananda My original script didn't handle absolute paths, so I just fixed that. It doesn't handle pathnames with embedded newlines...and I'm OK with that. And you probably misread the original regex that I used to extract the partial path for use as the tally index. Both it and the current version match up to the subdirectory component, not just the main directory. – Adrian Feb 13 '19 at 17:46
1

@Kusalananda And for full disclosure, I just noticed that I’d accidentally edited out a trailing slash from my regex. Time for bed. :) – Adrian Feb 13 '19 at 18:06
Note that using (( filecount["${BASH_REMATCH[1]}"]++ )) like that is an arbitrary command injection vulnerability. Try for instance with a subdir called x$(reboot). Write it as filecount[${BASH_REMATCH[1]}]=$((${filecount[${BASH_REMATCH[1]}]} + 1)) to avoid the problem. – Stéphane Chazelas Feb 25 '19 at 14:29

justinpc · Answer 4 · 2019-02-13T22:46:57.120

0

find $DIR -mindepth 2 -type f -exec bash -c 'echo ${0%${0#$1/*/}}' {} $DIR  \; | uniq -c

The -mindepth 2 means we look only at files which are descendants of direct subdirectories of $DIR.
-type f looks only at files.
-exec bash -c "..." {} $DIR executes the string with the arguments {} and $DIR, where {} is substituted with each file name found by find.
The echo part extracts the corresponding direct subdirectory of $DIR from a descendent filename. See https://stackoverflow.com/questions/16623835/remove-a-fixed-prefix-suffix-from-a-string-in-bash for an explanation of what % and # do. The 0 and 1 correspond to the first and second arguments after the string respectively.
find will list all descendants of direct subdirectories of $DIR in succession, so uniq -c will return the total number of descendant files along with the name for each direct subdirectory.

edited Feb 13 '19 at 22:46

answered Feb 13 '19 at 21:15

justinpc

121

1

Won't this show sub3 and sub3/subsub3 as two different entries? – Stephen Harris Feb 13 '19 at 21:29
Oh I see now. We need a command to extract direct subdirectories of ${DIR}. – justinpc Feb 13 '19 at 21:31
I've fixed the answer. – justinpc Feb 13 '19 at 22:41

fra-san · Answer 5 · 2019-02-25T15:12:10.307

Assuming that your bash version is at least 4.0, actually you were almost there.

You can allow your code to count files recursively with the globstar shell option. From man bash(1):

If set, the pattern ** used in a pathname expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a /, only directories and subdirectories match.

If you want to recursively count all files, including subdirectories, that are in your top-level directories:

shopt -s dotglob globstar
for dir in */; do
    all=( "$dir"/** )
    printf '%s\n' "$dir: ${#all[@]}"
done

As in the code you tried, for each of your top-level directory we are populating an array with the results of pathname expansion and then displaying the number of its elements.
dotglob is used to include files whose names start with . (hidden files).

If you want to recursively count all files except for subdirectory objects, you can just subtract the count of subdirectories from the count of all files:

shopt -s dotglob globstar
for dir in */; do
    all=( "$dir"/** )
    alldir=( "$dir"/**/ )
    printf '%s\n' "$dir: $(( ${#all[@]} - ${#alldir[@]} ))"
done

However, here I'm assuming a broad definition of "file", which, in POSIX, may refer to a regular file, character, block or FIFO special file, symbolic link, socket, directory, or whatever specific implementations may add beyond the standard.
To count a specific type of files only (e.g. regular files), it may be easier to resort to a find-based solution.
Alternatively you can extend the above code, testing for the file type in a loop:

shopt -s dotglob globstar
for dir in */; do
    all=( "$dir"/** )
    count=0
    for file in "${all[@]}"; do
        test -f "$file" && count="$(( "$count" + 1 ))"
    done
    printf '%s\n' "$dir: $count"
done

But this less convenient solution will also be significantly slower than the find-based alternative (e.g. more than two times slower than the faster one in Kusalananda's answer, tested on Linux with bash 5.0 and find 4.6).

Also note that, unlike find in its default behavior, pathname expansion with the globstar option will follow symbolic links that resolve to files, making all the above snippets include them in the counts as well.
(Initially it used to follow symbolic links that resolve to directories too, but this behavior has been changed in bash 4.3).

Finally — to also provide a solution that does not depend on the globstar shell option — you can use a recursive function to recursively count all regular files in the top-level subdirectories of the $1 directory:

#!/bin/bash

# nullglob is needed to avoid the function being
# invoked on 'dir/*' when * matches nothing
shopt -s nullglob dotglob

function count_files () {
    for file in "$1"/*; do
        # Only count regular files
        [ -f "$file" ] && count="$(( "$count" + 1 ))"
        # Only recurse on directories
        [ -d "$file" ] && count_files "$file"
    done
}

for dir in "$1"/*/; do
    count="0"
    count_files "$dir"
    printf '%s: %s\n' "$dir" "$count"
done

score 0 · Answer 6 · answered Apr 04 '22 at 18:42

0

find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n

This will list all first-level directories and the number of files in each subdirectory (recursively)... sorted by number of files.

answered Apr 04 '22 at 18:42

lowcrawler

101

Alexei Sholomitskiy · Answer 7 · 2022-08-27T00:42:11.613

0

ls --indicator-style=file-type -R -A Dir1/* | sed '/^$/d' | sed '/.*[/:]$/d' | wc -l

ls has many options + using powerful sed can everything

edited Aug 27 '22 at 00:42

answered Aug 27 '22 at 00:05

Alexei Sholomitskiy

1

Reporting number of files in Subdirectories, Bash

7 Answers7

Linked