2

Since I'm having performance problems with rsnapshot, I'd like to identify directories with great number of files recursively. I think the problem is not the size of the files, but the file count in particular subdirectories, because the generations (daily.0, daily.1, ...) are not volatile and only have few changes compared to the total number of files.

Unix command du would be exactly what I want, if it returned only the file count and not the sum of file sizes.

I already have a bash script which outputs the file count of all direct subdirectories (recursing into subdirectories), but it's cumbersome to use, because I have to dig deeper and deeper and always have to wait.

Found also a script digging deeply, but not summing up file count of subdirectories. It shows only the number of files in this directory, not from its children.

Doesn't have to be a shell script - I'm open to other scripting languages like Ruby, Python, Perl, JavaScript, ...

Example:

dir1/
   file1
   subdir1/
       file2, file3, file4, file5
   subdir2/
       file6, file7, file8
       subdir3/
           file9
dir2/
    fileA, fileB

Desired output (listing subdirectories and sum up to top):

4   dir1/subdir1
1   dir1/subdir2/subdir3
4   dir1/subdir2
9   dir1/
2   dir2/

What I don't want (only listing totals):

9   dir1/
2   dir2/

and not (only listing file count of . directory):

4   dir1/subdir1
1   dir1/subdir2/subdir3
3   dir1/subdir2
1   dir1/
2   dir2/
hgoebl
  • 123
  • Thanks @Gilles for your answer. First: I've got coreutils 8.21, so du --inodes doesn't work. The script with awk didn't work (probably I'm missing some bash knowledge). Next: Many user won't find your answer with a similar question or problem like mine. Of course I've heard of inodes, but who exactly knows the subtile differences between inodes and files? => IMO my question is not a duplicate from many users point of view. – hgoebl Aug 02 '15 at 19:45
  • “Didn't work” is not a usable bug report. As for the duplicate, that's the whole points of duplicates: people looking for your question but not using the word “inode” will find your question and follow the link to the answered duplicate. – Gilles 'SO- stop being evil' Aug 02 '15 at 19:50
  • You're right. I hate "didn't work" also. In my case it was awk: cmd. line:8: ^ syntax error. Maybe I should have added line continuation characters \ at the end of each line? But you may have a look at the answers. They are also brilliant. Unbelievable what can be expressed in one line... – hgoebl Aug 02 '15 at 19:54
  • Ah, there was a stray quote in a comment. Fixed. – Gilles 'SO- stop being evil' Aug 02 '15 at 19:58
  • Hi. On macOS, You can use gdu --inodes instead of du: gdu --inodes | sort -n - seems to work usefully. Requires brew install coreutils IIRC. – KarolDepka Jan 04 '22 at 20:40

3 Answers3

2

Try something like this:

find . -type f | perl -aF/ -lne 'for (my $i=0; $i < @F-1; ++$i) { print join("/",@F[0...$i]); }' | sort | uniq -c

find . -type f prints files:

./dir1/subdir2/file8
./dir1/subdir2/file7
./dir1/subdir2/subdir3/file9
./dir1/subdir2/file6
./dir1/file1
...

perl -aF/ -lne 'for (my $i=0; $i < @F-1; ++$i) { print join("/",@F[0...$i]); }' translates each filename ./a/b/c to a set of directories ., ./a, ./a/b

Note:

doesn't work with newlines in filenames. You can use -print0 in find, -0 in perl, and put counters for each directory in hash.

Edit:

Inspired by @Gilles's answer:

find . -depth -print0 |
perl -0 -ne '
my $depth = tr!/!/!;
for (my $i = $prev_depth; $i <= $depth; ++$i) { $totals[$i] = 0; }
if ( -f $_ ) {
  for (my $i = 0; $i <= $depth; ++$i) { ++$totals[$i]; }
} else {
  print "$totals[$depth]\t$_\n";
}
$prev_depth = $depth;
'

Works fine with newlines in filenames. Works fine with empty directories. Doesn't require additional sort | uniq -c.

Evgeny
  • 5,476
1

If you have find (which can be used to iterate through all files in a directory, including all files in subdirectories of the directory) and wc (which counts the number of lines in a file) then how about the one-liner

find <directory> | wc

where <directory> is the directory you want to count all the files in. This prints out three numbers; the first is the number of lines that find returned. I guess find by default finds files and directories, so this will give the total count of the number of files and directories in <directory> (including <directory> itself).

find is an extremely flexible command. If you are genuinely only interested in files and want to not count directories then

find <directory> -type f | wc

will do the trick. For example, to count all files contained (however deeply) in the current directory, you can do

find . -type f | wc

Caveats: By default find will not follow symlinks etc; if your files are on various different filesystems or what have you then you should look at the manual page for find because it can be set up to deal with pretty much anything. Note also that wc is counting lines, so if you happen to have files with names that have newlines in (technically possible but not as far as I know a good idea in general) or some such thing then you'll get funny answers.

eric
  • 111
  • Thanks, but this is not what I wanted. I'd like to have information like su, which recurses into subdirectories and sums up the file sizes to the top. The difference: I'd like to have the file count, not the size. – hgoebl Aug 02 '15 at 15:15
  • I was just about to comment on file names with newlines... find has a printf option, IIRC, so you can output a simple '.' per match. (Still doesn't directly do what the OP wants, though.) I guess you could find for all directories first, and then -exec a second findto count files in that dir, but that seems wasteful as it touches the subdirs multiple times. – Ulrich Schwarz Aug 02 '15 at 15:36
1

Based on my comment, a variation on this might do what you want:

find . -depth -type d -exec /bin/sh -c 'printf "%5d %s\n" "$(find {} -type f -printf . | wc -c)" "{}"' \;

(which the doing it properly brigade will surely, and rightfully, shoot me for for calculating the result for deeper subdirectories several times and hoping the filesystem cache has the entire metadata of the tree at some point, and also spawning a new shell every time, but it's a start.)

On your example structure, I get:

    4 ./dir1/subdir1
    1 ./dir1/subdir2/subdir3
    4 ./dir1/subdir2
    9 ./dir1
    2 ./dir2
   11 .

(to exclude the current working dir, either change the outer find . to find * or use find . -mindepth 1

  • Your solution works like a charm. Thank you! Unfortunately (for you) the solution @EvgenyVereshchagin is better performing. But anyway - you've found a very elegant solution! – hgoebl Aug 02 '15 at 19:48