5

I'd like a simple command (perhaps using find) which finds all files > some size in bytes, kilobytes, megabytes, or gigabytes, and which prints their size as they are found.

This command, for instance, finds all files > 10 MB, but does not show their size, unfortunately:

find . -size +10M

See also:

  1. Files greater than 1 GB and older than 6 months - doesn't show size
  2. Linux show files in directory larger than 1 GB and show size - accepted answer looks more-complicated than expected, and I don't need sorting.
  3. Find files greater than X value, sort by size, show in ls format - I don't need nor want ls format, and main answer may not work with spaces
  4. https://linuxconfig.org/how-to-use-find-command-to-search-for-files-based-on-file-size

3 Answers3

4

How about just:

find -size +10M -exec du -h {} \;
gaussian
  • 194
  • Can you explain how exec works please, and why we need {} and \ and ; ? – Gabriel Staples Aug 29 '21 at 15:08
  • 1
    As far as I know, "find" alone does not have an option to show file size. "exec" runs a shell command, { } is a placeholder for where the file name goes in the command, \; is for ending the line. The difference between this and the more complex answers you originally linked is that this is shorter and easier to understand (presumably), but might fail if you have special characters in your filenames (e.g. if you have a file named doc*.txt). – gaussian Aug 29 '21 at 15:33
  • 2
    The above should have ; not \; – gaussian Aug 29 '21 at 15:34
  • works also on toybox – alecxs Sep 05 '21 at 06:25
  • Note that -size compares size, while du reports disk usage (and for files of type directory, that's disk usage of the directory itself + all files and directories than can be found in its traversal, recursively. Note that standard find requires at least one file to be passed as argument. – Stéphane Chazelas May 30 '23 at 04:59
4

Note that the M in find . -size +10M, is a GNU extension. The GNU implementation of find has another extension: -printf that you can use to print the size of those files:

find . -size +10M -printf '%s %p\n'

To report the size and path of the files that are more than 10MiB (mebibytes, not megabytes¹) large.

Here, you could also use zsh and its stat builtin and glob qualifiers:

zmodload zsh/stat
stat -nL +size -- **/*(LM+10)

That prints the size after the file path. Also note that contrary to find, that excludes hidden files by default (you can add them back with the D qualifier).

Or to print raw on 2 Columns (across):

stat -nLA report +size -- **/*(LM+10)
print -raC2 -- $report

Note that all those -size, %s, +size consider the file's size, which is not the same as the file's disk usage² as reported by du (or %b/%k in gfind -printf, or +block in zsh's stat, though du also includes the size of unique files within for files of type directory). find has no predicate to filter files based on their disk usage.


¹ for files larger than 10 MB (megabyte), you'd need -size +10000000c (standard) in find or (L+10000000) in zsh

² bigger files generally take up more disk space, but not necessarily and in any case the relation between the the two is not linear.

  • Upvoted. It looks like there's also an option for %k for "The amount of disk space used for this file in 1 KB blocks." (per man find), but I don't see an option for MB or GB output, oddly enough. I slightly favor the find -size +10M -exec du -h {} \; answer, therefore, because it prints in "human-readable" format with an auto-chosen unit (K, M, or G), based on the file's size. I find that a little more usable for most cases, except when I want more precision at the byte-level for scripts and auto-parsing. – Gabriel Staples May 30 '23 at 04:46
  • 1
    @GabrielStaples I do mention %b/%k and du already in passing in the last paragraph, but that's about disk usage (cumulative for files of type directory), not size (though the GNU implementation of du has some option to report size instead of disk usage). – Stéphane Chazelas May 30 '23 at 04:48
  • Understood. By the way, since we're talking about disk usage vs size, I did some studies and found some interesting results, which I posted on my website here: exFAT filesystem speed and disk usage based on cluster size. – Gabriel Staples May 30 '23 at 04:53
0

If you use zsh and a ls that supports -h for human readable sizes, you can do something like:

ls -lhd -- **/*(.Lm+10)

which finds all non-hidden files greater than 10MiB in size and does ls -ldh on them. You might get argument list too long. If you do, you could use zargs but then you get into as much complication as using find.

Also, not to falsely accuse zsh of problems but I've seen weirdnesses when doing **/* on large directories. i.e. ls -ld /**/*(.), even with using zargs can never end. But... for smaller jobs, having zsh do the work is often more convenient once you get the syntax for zsh down into your finger tips.

pedz
  • 173
  • Note that I already mention zsh's */(LM+10) in my answer, using zsh's stat builtin which avoids the arg list too long issue. – Stéphane Chazelas May 30 '23 at 04:56
  • I wouldn't call * excluding "hidden" files. They are not hidden at all. They are just not interesting and so the convention is to not list them. Windows and the Mac GUI actually hides things and the user must do gymnastics to see them. Mac GUI / Windows is trying and failing to "simplify" or "protect" the user when in fact it only gets in the way. All in all, I much prefer my original answer to your edited one. – pedz May 31 '23 at 13:46