Find the top 50 directories containing the most files/directories in their first level?

Question

How can I use find to generate a list of directories which contain the most numbers of files. I'd like the list to be from highest to lowest. I'd only like the listing to go 1 level deep, and I'd typically run this command from the top of my filesystem, i.e. /.

Different question (actually the same but asked differently), but wouldn't the answer solve your question as well? http://unix.stackexchange.com/questions/117093/find-where-inodes-are-being-used — phemmer, Apr 03 '14 at 02:25
Also related - http://stackoverflow.com/questions/15216370/how-to-count-number-of-files-in-each-directory. This is what I based my original answer on the inode question off of, although I think my approach offers some improvements over the ones there. — Graeme, Apr 03 '14 at 02:36
@Patrick - this is a loaded Q just to house Graemes A. True the bits are buried in the other Q's A's, but this was to bring this bit out so that it could be referenced going forward. — slm, Apr 03 '14 at 02:38
@slm Then I really don't get why this isn't a duplicate. His answer seems to be just an elaboration of an answer on another question. So now we have 3 questions for the same thing. I think the answer on my link is cleaner too. Launching a shell for every directory found just feels dirty. — phemmer, Apr 03 '14 at 02:44
@Patrick - just b/c the same answer can be used for 2 Q's doesn't make them dups. The other Q is asking about finding inodes, this one is asking about files/directories in the first level. If you feel your A is better on that Q then feel free to post it on this one as a potential A. — slm, Apr 03 '14 at 03:03
@Patrick - I'm sure we have more than 3 Q&A's with some of these bits kicking around here. I would expect a user to think of looking for files/dirs. but not necessarily understand inodes, that's why I created this Q as well. — slm, Apr 03 '14 at 03:07
@Patrick, I have reworked the answer so that the GNU solution doesn't start a new shell for every directory. Though note this is the standard solution to deal with any filename portably. — Graeme, Apr 03 '14 at 03:38
@slm This doesn't address inodes. These are directory listings - nothing more. You can easily have many more directory listings than you do inodes. — mikeserv, Apr 03 '14 at 09:43
@Graeme - I fixed my answer so that it does handle inodes now. — mikeserv, Apr 03 '14 at 12:11

score 30 · Answer 1 · edited Apr 13 '17 at 12:36

UPDATE: I did all of that below, which is cool, but I came up with a better way of sorting directories by inode use:

du --inodes -S | sort -rh | sed -n \
        '1,50{/^.\{71\}/s/^\(.\{30\}\).*\(.\{37\}\)$/\1...\2/;p}'

And if you want to stay in the same filesystem you do:

du --inodes -xS

Here's some example output:

15K     /usr/share/man/man3
4.0K    /usr/lib
3.6K    /usr/bin
2.4K    /usr/share/man/man1
1.9K    /usr/share/fonts/75dpi
...
519     /usr/lib/python2.7/site-packages/bzrlib
516     /usr/include/KDE
498     /usr/include/qt/QtCore
487     /usr/lib/modules/3.13.6-2-MANJARO/build/include/config
484     /usr/src/linux-3.12.14-2-MANJARO/include/config

NOW WITH LS:

Several people mentioned they do not have up-to-date coreutils and the --inodes option is not available to them. So, here's ls:

sudo ls -AiR1U ./ | 
sed -rn '/^[./]/{h;n;};G;
    s|^ *([0-9][0-9]*)[^0-9][^/]*([~./].*):|\1:\2|p' | 
sort -t : -uk1.1,1n |
cut -d: -f2 | sort -V |
uniq -c |sort -rn | head -n10

This is providing me pretty much identical results to the du command:

DU:

15K     /usr/share/man/man3
4.0K    /usr/lib
3.6K    /usr/bin
2.4K    /usr/share/man/man1
1.9K    /usr/share/fonts/75dpi
1.9K    /usr/share/fonts/100dpi
1.9K    /usr/share/doc/arch-wiki-markdown
1.6K    /usr/share/fonts/TTF
1.6K    /usr/share/dolphin-emu/sys/GameSettings
1.6K    /usr/share/doc/efl/html

LS:

14686   /usr/share/man/man3:
4322    /usr/lib:
3653    /usr/bin:
2457    /usr/share/man/man1:
1897    /usr/share/fonts/100dpi:
1897    /usr/share/fonts/75dpi:
1890    /usr/share/doc/arch-wiki-markdown:
1613    /usr/include:
1575    /usr/share/doc/efl/html:
1556    /usr/share/dolphin-emu/sys/GameSettings:

I think the include thing just depends on which directory the program looks at first - because they're the same files and hardlinked. Kinda like the thing above. I could be wrong about that though - and I welcome correction...

The underlying method to this is that I replace every one of ls's filenames with its containing directory name in sed. Following on from that... Well, I'm a little fuzzy myself. I'm fairly certain it's accurately counting the files, as you can see here:

% _ls_i ~/test
> 100 /home/mikeserv/test/realdir
>   2 /home/mikeserv/test
>   1 /home/mikeserv/test/linkdir

DU DEMO

% du --version
> du (GNU coreutils) 8.22

Make a test directory:

% mkdir ~/test ; cd ~/test
% du --inodes -S
> 1       .

Some children directories:

% mkdir ./realdir ./linkdir
% du --inodes -S
> 1       ./realdir
> 1       ./linkdir
> 1       .

Make some files:

% printf 'touch ./realdir/file%s\n' `seq 1 100` | . /dev/stdin
% du --inodes -S
> 101     ./realdir
> 1       ./linkdir
> 1       .

Some hardlinks:

% printf 'n="%s" ; ln ./realdir/file$n ./linkdir/link$n\n' `seq 1 100` | 
    . /dev/stdin
% du --inodes -S
> 101     ./realdir
> 1       ./linkdir
> 1       .

Look at the hardlinks:

% cd ./linkdir
% du --inodes -S
> 101

% cd ../realdir
% du --inodes -S
> 101

They're counted alone, but go one directory up...

% cd ..
% du --inodes -S
> 101     ./realdir
> 1       ./linkdir
> 1       .

Then I ran my ran script from below and:

> 100     /home/mikeserv/test/realdir
> 100     /home/mikeserv/test/linkdir
> 2       /home/mikeserv/test

And Graeme's:

> 101 ./realdir
> 101 ./linkdir
> 3 ./

So I think this shows that the only way to count inodes is by inode. And because counting files means counting inodes, you cannot doubly count inodes - to count files accurately inodes cannot be counted more than once.

OLD:

I find this faster, and it's portable:

sh <<-\CMD
    { echo 'here='"$PWD"
        printf 'cd "${here}/%s" 2>/dev/null && {
                set -- 
                for glob in ".[!.]*" "[!.]*" ; do
                    set -- $glob "$@" && 
                        [ -e "./$1" ] || shift
                done    
                printf "%%s\\t%%s\\n" $# "$PWD"
        }\n' $( find . -depth -type d 2>/dev/null )
    } | . /dev/stdin |
    sort -rn | 
    sed -n \
        '1,50{/^.\{71\}/s/^\(.\{30\}\).*\(.\{37\}\)$/\1...\2/;p}'
CMD

It doesn't have to -exec for every directory - it only uses the one shell process and one find. I have to get the set -- $glob right still to include .hidden files and all else, but it's very close and very fast. You would just cd into whatever your root directory should be for the check and off you go.

Here's a sample of my output run from /usr:

14684   /usr/share/man/man3
4322    /usr/lib
3650    /usr/bin
2454    /usr/share/man/man1
1897    /usr/share/fonts/75dpi
...
557     /usr/share/gtk-doc/html/gtk3
557     /usr/share/doc/elementary/latex
539     /usr/lib32/wine/fakedlls
534     /usr/lib/python2.7/site-packages/bzrlib
500     /usr/lib/python3.3/test

I also use sed at the bottom there to trim it to the top 50 results. head would be faster, of course, but I also trim each line if necessary:

...   
159     /home/mikeserv/.config/hom...hhkdoolnlbekcfllmednbl/4.30_0/plugins
154     /home/mikeserv/.config/hom...odhpcledpamjachpmelml/1.3.11_0/js/ace
...

It's crude, admittedly, but it was a thought. Another crude device I use is dumping 2>stderr for both find and cd into 2>/dev/null. It's just cleaner than looking at permissions errors for directories I can't read without root access - perhaps I should specify that to find. Well, it's a work in progress.

Ok, so I did fix the shell globs like this:

for glob in ".[!.]*" "[!.]*" ; do
    set -- $glob "$@" && 
        [ -e "./$1" ] || shift
done

I was actually going to ask a question on how it could be done, but as I was typing in the question title the site pointed me to a suggested related question where, lo and behold, Stephane had already weighed in. So that was convenient. Apparently [^.], while well-supported, is not portable and you have to use the !bang. I found that in Stephane's comment there.

Anyway, just pulling in hidden files wasn't enough though, obviously. So I have to set twice in order to avoid searching positionals for the literal $glob. Still, it doesn't seem to affect performance at all, and it reliably adds every file in the directory.

@Graeme You know, neither of our solutions are actually handling inodes, though. A lot of those files we're listing are likely hard-linked to one another. I think I could do this with ls -i and... I guess... probably grep... maybe - well, you're using -xdev, which is a start... uniq and sort? — mikeserv, Apr 03 '14 at 05:05
What version of du are you running? My du has no --inodes option. — phemmer, Apr 03 '14 at 13:04
That's a bleeding edge feature :-) I'm running 8.21. Looks like it was added 2013-07-27: http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=333dc83d52e014a0b532e316ea8cd93b048f1ac6 — phemmer, Apr 03 '14 at 13:10
Also, if you don't mind, could you post that on this question. I don't think I'll accept it as it's not very portable, but I will upvote, and it'd be nice to have another solution on the question. — phemmer, Apr 03 '14 at 13:13
@Patrick - It's not bleeding edge - it's stable GNU Coreutils, I'm not running a beta version. Still, yeah, I could. I can it work with ls -i as well. — mikeserv, Apr 03 '14 at 13:22
No --inodes for me either, I guess Debian is behind the game with packaging this one. — Graeme, Apr 03 '14 at 13:44
@Graeme I can do it with ls - with one invocation of ls even - I just got an idea of how. But Debian's kind of famously behind the game so it's no real surprise. But a little while longer and I'll show you the other way. — mikeserv, Apr 03 '14 at 13:47
The ls one works well and is much faster than the find approach. I would add that a mount can be used to search the root filesystem. Also, it won't work with relative paths unless you have ./ (or ../) at the start. — Graeme, Apr 04 '14 at 14:14
@Graeme I thought I fixed that path thing. Maybe I just didn't save the edit - it's probably open somewhere in one of these tabs. The truth is though that this should be done with awk, I think. sed is very difficult here because I can't squeeze the inode numbers in regex - whereas awk (I think) could with ease. Unfortunately - I never learned how to use it. Possibly I could also use grep, but I really think at least two of those sorts would be entirely unnecessary if awk did this. — mikeserv, Apr 04 '14 at 14:19
@Graeme it does handle any path as written, however, I guess I had two different ones up there. One was right(er?) - and one was wrong. It's rectified now I think. Sorry about that. — mikeserv, Apr 04 '14 at 14:51
If I use the version here on a directory mnt, I just get a single number printed. Its the /^[./]/ that does it AFICT. Not having a . or / at the beginning makes it more difficult to identify a path (since you could have a relative path starting with numbers. I would just leave it and not the requirement. Btw my last comment was supposed to say 'bind mount', which probably makes more sense since ls doesn't have a -xdev equivalent. — Graeme, Apr 04 '14 at 15:59
What if I only want to display the folders that have at least 100 files in them? — Nike Dattani, Feb 03 '23 at 05:42

score 23 · Accepted Answer · edited Apr 13 '17 at 12:36

Using GNU tools:

find / -xdev -type d -print0 |
  while IFS= read -d '' dir; do
    echo "$(find "$dir" -maxdepth 1 -print0 | grep -zc .) $dir"
  done |
  sort -rn |
  head -50

This uses two find commands. The first finds directories and pipes them to a while loop runs the next find for each directory. The second lists all the child files/directories in the first level while grep counts them. The grep allows -print0 to be used with the second find since wc does not have a -z equivalent. This stops filenames with a newline from being counted twice (although using wc and no -print0 wouldn't make much difference).

The result of the second find is placed in the argument to echo so it and the directory name can easily be placed on the same line (the $(..) construct automatically trims the newline at the end of grep). Lines are then sorted by number and the 50 largest numbers shown with head.

Note that this will also include the top level directories of mount points. A simple way to get around this is to use a bind mount and then use the directory of the mount. To do this:

sudo mount --bind / /mnt

A more portable solution uses a different shell instance for each directory (also answered here):

find / -xdev -type d -exec sh -c '
  echo "$(find "$0" | grep "^$0/[^/]*$" | wc -l) $0"' {} \; |
  sort -rn |
  head -50

Sample output:

9225 /var/lib/dpkg/info
6322 /usr/share/qt4/doc/html
4927 /usr/share/man/man3
2301 /usr/share/man/man1
2097 /usr/share/doc
2097 /usr/bin
1863 /usr/lib/x86_64-linux-gnu
1679 /var/cache/apt/archives
1628 /usr/share/qt4/doc/src/images
1614 /usr/share/qt4/doc/html/images
1308 /usr/share/scilab/modules/overloading/macros
1083 /usr/src/linux-headers-3.13-1-common/include/linux
1071 /usr/src/linux-headers-3.13-1-amd64/include/config
847 /usr/include/qt4/QtGui
774 /usr/include/qt4/Qt
709 /usr/share/man/man8
616 /usr/lib
611 /usr/share/icons/oxygen/32x32/actions
608 /usr/share/icons/oxygen/22x22/actions
598 /usr/share/icons/oxygen/16x16/actions
579 /usr/share/bash-completion/completions
574 /usr/share/icons/oxygen/48x48/actions
570 /usr/share/vim/vim74/syntax
546 /usr/share/scilab/modules/m2sci/macros/sci_files
531 /usr/lib/i386-linux-gnu/wine/wine
530 /usr/lib/i386-linux-gnu/wine/wine/fakedlls
496 /etc/ssl/certs
457 /usr/share/mime/application
454 /usr/share/man/man2
450 /usr/include/qt4/QtCore
443 /usr/lib/python2.7
419 /usr/src/linux-headers-3.13-1-common/include/uapi/linux
413 /usr/share/fonts/X11/misc
413 /usr/include/linux
375 /usr/share/man/man5
374 /usr/share/lintian/overrides
372 /usr/share/cmake-2.8/Modules
370 /usr/share/fonts/X11/75dpi
370 /usr/share/fonts/X11/100dpi
356 /usr/share/icons/gnome/24x24/actions
356 /usr/share/icons/gnome/22x22/actions
356 /usr/share/icons/gnome/16x16/actions
353 /usr/share/icons/gnome/48x48/actions
353 /usr/share/icons/gnome/32x32/actions
341 /usr/lib/ghc/ghc-7.6.3
326 /usr/sbin
324 /usr/share/scilab/modules/compatibility_functions/macros
324 /usr/share/scilab/modules/cacsd/macros
320 /usr/share/terminfo/a
319 /usr/share/i18n/locales

To use Graeme's find solution on OSX I first needed to install findutils via brew brew install findutils ... and then gfind . -xdev -type d -exec sh -c 'echo "$(find "$0" | grep "^$0/[^/]*$" | wc -l) $0"' {} \; | sort -rn | head -50 — robbogdan, Aug 10 '22 at 21:47
@robbogdan Ok, but none of the command that you show requires GNU find. The part that requires GNU find from this answer is -print0 which is also supported by find on macOS. — Kusalananda, Aug 15 '22 at 13:05

score 0 · Answer 3 · answered Apr 03 '14 at 19:07

0

Why not use something like KDirStat Although it was originally written for KDE but it works fine with GNOME aswell It gives you best view of number of file/dir and respective usage in GUI

answered Apr 03 '14 at 19:07

friendyogi

9

1

Looking for command line method. – slm Apr 03 '14 at 20:36
No GUI? Server shell? – Artfaith Aug 10 '20 at 03:37

score 0 · Answer 4 · answered Nov 14 '22 at 16:51

To find a list of top directories that contain the biggest number of entries (files and directories) I ended up with a simple command (GNU tools):

find /usr -xdev -type d -print | xargs -n1 du --inodes -sS | sort -rn | head -10

and the output looks as follow:

20418   /usr/share/doc/libreoffice-7.3.6.2/sdk/docs/idl/ref
12155   /usr/share/man/man3
5989    /usr/share/gtk-doc/html/gtk4
3866    /usr/lib64
3862    /usr/share/doc/openssl-1.1.1q/html/man3
3046    /usr/share/gtk-doc/html/gdk4
2478    /usr/bin
2382    /usr/share/fonts/noto
2376    /usr/share/man/man1
2371    /usr/src/linux-5.16.20-gentoo/arch/arm/boot/dts

score 0 · Answer 5 · answered Nov 14 '22 at 17:20

That calls for zsh glob qualifiers:

print -rC1 -- **/*(ND/nOe['(){REPLY=$#;} $REPLY/*(NDoN)'][1,50])

print -rC1 --: prints its arguments raw on 1 Column
**/*: recursive globbing: any file in any number of subdirectories
(N...): glob qualifiers to further qualify the glob expansion
N: Nullglob. Does not complain if there's no match.
D: Dotglob. Also consider hidden files
nOe[code]: reverse Order numerically based on the evaluation of the code.
the code here sets $REPLY to the number of files in the directory by passing the expansion of $REPLY/*(NDoN) to an anonymous function that stores its number of arguments in $REPLY.
[1,50]: return only the first 50.