How to get directories and and subdirectories count fastest way?

Question

I want to get the number of directories and sub directories.I tried following command which takes very long time . I tried waiting about an hour to finish the command.

commands I used

$ find . -type d | wc -l

and

$ du -ch | wc -l

Both commands took more than 1 hour and did not complete the main folder size I'm trying to get information is about 120 GB.

For reference, I just tried that find command on a directory structure containing 211G of data on a local disk and it took 0.952 seconds of CPU time. — John1024, Feb 24 '14 at 05:39
How do you know it's 120GB if du never completed? Can you post the out of df -Ti on all the filesystems mounted at or bellow .? — Stéphane Chazelas, Feb 24 '14 at 10:50
The total size is pretty much irrelevant here, what matters is the number of files. If there are a lot of files, counting them will take a long time. — Gilles 'SO- stop being evil', Feb 24 '14 at 23:35

score 3 · Answer 1 · answered Feb 24 '14 at 06:40

If you only want a single level, then there is a trick to doing this without having to enumerate the directory. If you want recursion, then what you've got is the best you're going to get.

The single level trick:

stat --printf='%h\n' /path/to/dir

...and subtract 2. The result is the number of directories within that directory (non recursive).

That command shows the number of hard links on the specified file. Whenever you create a directory inside a directory, the sub directory has a hard link entry to the parent directory, the ... So by creating a sub directory, you increase the number of hard links to the parent directory by one. But we subtract 2 because every directory starts off with 2 hardlinks. One hardlink is in the parent directory and points to it: the dir entry inside /path/to. The other hardlink is the directory containing a link to itself: the . entry.

However with recursion, you have to examine each directory. The problem is that there's no way to say "give me a list of only directories within this directory". You have to get a list of every single entry in the directory, and then stat each one to find out if it's a directory or a file.

Now when you stat the directory, you can use the above hardlink trick to find if that directory contains any sub directories, and thus you can save yourself a little bit of time and not descend into that directory. The find utility actually uses this trick to get a little performance gain in the process.

So basically, using find is going to be the best you can do if you want recursion.

Note that the hardlink trick doesn't work on all filesystems. It doesn't work on btrfs for instance. You can do find . -type d -printf x -links 2 -prune | wc -c to use it here. — Stéphane Chazelas, Feb 24 '14 at 10:59

score 2 · Answer 2 · answered Feb 24 '14 at 06:36

find . -type d | wc -l does not give you the correct value if there are newlines somewhere. Furthermore it counts the start directory which probably isn't intended. I do not believe that the pipeline is the bottleneck but this can easily be optimized:

find . -mindepth 1 -type d -printf . | wc -c

How to get directories and and subdirectories count fastest way?

commands I used

2 Answers2

Linked