4

I have a directory, containing a lot of files and directories.

I am trying to get the number of files (and directories) contained recursively in every directory.

I tried the following approach:

for dir in $(find -maxdepth 1 -type d); do echo "$dir"; echo find "$dir" | wc -l; done

But this returns "1" as result for every directory.

I know there are several other questions with a similar question, but I would really like to know what my mistake is in the code above.

Majiy
  • 143

5 Answers5

6

A GNU (bash, wc and find) solution which works with any path, even those containing spaces, newlines or starting with a dash:

shopt -s nullglob
for dir in ./*/
do
    printf '%s\n' "$dir"
    find "$dir" -mindepth 1 -printf x | wc --chars
done

Explanation:

  • The nullglob option prevents errors if ./ contains no directories.
  • The ./ in the directory glob ensures that file names starting with a dash ("-") won't mess up echo or find.
  • The slash at the end of the glob ensures that only directories are processed.
  • -mindepth 1 avoids counting the directory itself.
  • If you want to include directories which start with a dot on the top level, you should run shopt -s dotglob before the for loop.
l0b0
  • 51,350
  • Works like a charm. Thank you for your answer and your explanations. – Majiy Apr 03 '13 at 09:58
  • 2
    +1 for the -printf x : otherwise, the default will output the name of the file (which could : contain "newline", and therefore be counted twice. Or contain other things such as "NULL" characters, making lots of commands choke) – Olivier Dulac Apr 03 '13 at 11:35
  • 1
    @OlivierDulac A path can not contain null characters. It's in fact the only character they can't contain. – l0b0 Apr 03 '13 at 11:41
  • @l0b0: good point (I should have known, as I use "-print0" and same with xargs, when available). But still, nice to avoid any "weird" characters out there ^^ (newline, at least) – Olivier Dulac Apr 03 '13 at 11:49
1

Here's another method with bash4+. Note that it follows symlinks, and doesn't include . and .. unlike l0b0's answer (which may or may not be what you want):

(
    shopt -s dotglob globstar nullglob
    for dir in */; do
        set -- "$dir"/**/*
        printf '%s: %d\n' "$dir" "$#"
    done
)
Chris Down
  • 125,559
  • 25
  • 270
  • 266
1

$(find -maxdepth 1 -type d) outputs the list of directories in the current directory. Unless there are directories whose name begins with a ., this is a complex way of writing */. It's also unreliable: it only works if none of the directory names contain whitespace or globbing characters \[?*. That's because the result of the command substitution $(…) is split into separate words wherever there's a whitespace character, and each word is interpreted as a glob (filename wildcard pattern). You can avoid this behavior by putting the command substitution in double quotes ("$()"), but then the list that the loop iterates on will contain a single element which is the concatenation of the directory names separated by newlines.

Note this shell programming rule: always put double quotes around variable substitutions and command substitutions ("$foo", "$(foo)") unless you know that you need to leave the double quotes out and you understand how it's safe to leave them out.

The other problem with your script is a simple one: echo find "$dir" always prints out one line; you meant find "$dir".

for dir in */; do
  echo "$dir"
  find "$dir" | wc -l
done

Note that this only works if no file inside that tree contains newlines. If they might, you can make the find command print something reliable. With GNU find (i.e. on non-embedded Linux or Cygwin):

for dir in */; do
  echo "$dir"
  find "$dir" -printf a | wc -c
done

Portably:

for dir in */; do
  echo "$dir"
  find "$dir" -exec printf %c {} + | wc -c
done
1

A small, slightly faster variant of Gilles' portable solutions would be:

for dir in */; do
  echo "$dir"
  #find "$dir" -exec printf %c {} + | wc -c
  find "$dir" -print0 | tr -dc '\0' | wc -c
done
chad
  • 11
0

Using GNU Parallel it will look like this:

parallel -0 --tag  'find {} |wc -l' ::: */

It will run one find|wc per CPU in parallel. Depending on your storage system parallelization may increase or decrease speed - only way to know is to test it. The number of processes can be adjusted with -j.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Ole Tange
  • 35,514