254

Assume there's an image storage directory, say, ./photos/john_doe, within which there are multiple subdirectories, where many certain files reside (say, *.jpg). How can I calculate a summary size of those files below the john_doe branch?

I tried du -hs ./photos/john_doe/*/*.jpg, but this shows individual files only. Also, this tracks only the first nest level of the john_doe directory, like john_doe/june/, but skips john_doe/june/outrageous/.

So, how could I traverse the entire branch, summing up the size of the certain files?

mbaitoff
  • 5,101

14 Answers14

312
find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$

If more than one invocation of du is required because the file list is very long, multiple totals will be reported and need to be summed.

Hauke Laging
  • 90,279
SHW
  • 14,786
  • 14
  • 66
  • 101
  • 16
    find -iname 'file*' -exec du -cb {} + | grep total$ | cut -f1 | paste -sd+ - | bc

    summed byte size

    – Michal Čizmazia Jul 15 '15 at 13:55
  • 4
    If your system works under other language then you need to change total$ to other word like razem$ in Polish. – Zbyszek Jul 26 '15 at 12:49
  • 2
    You can add LC_ALL=POSIX as prefix to always grep for total like this: LC_ALL=POSIX find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$ – Sven Jun 27 '16 at 05:48
  • 2
    If you're not using -name, then change the grep to grep -P "\ttotal$" or else it will capture all files ending with "total" as well. – thdoan Mar 30 '17 at 07:43
  • 4
    @MichalČizmazia some shells (e.g., Git Bash for Windows) don't come with bc, so here is a more portable solution: find -name '*.jpg' -type f -exec du -bc {} + | grep total$ | cut -f1 | awk '{ total += $1 }; END { print total }' – thdoan Mar 30 '17 at 07:55
  • 1
    why not invert grep so it works for all languages? ...|grep -v .jpg$ – iRaS Oct 13 '20 at 06:53
  • 3
    What does the + do at the end of the find command? I couldn't find any mention of it in man find. – localhost Jan 01 '21 at 14:31
  • When totals come with units at end bc does not operate well, it is better to use: find -iname 'file*' -exec du -cb {} + |grep -e "total$" |cut -f1 |paste -sd+ - |bc |numfmt --to=iec --suffix=B --round=towards-zero – Jester May 17 '21 at 06:59
  • This is the most portable, flexible, Unix-like answer that gets close. Every other answer either doesn't answer the question or works only with bash or Linux or GNU find. – Cliff Aug 31 '22 at 04:23
  • @localhost: https://stackoverflow.com/a/6085237/785194 – EML Nov 20 '23 at 16:21
107
du -ch public_html/images/*.jpg | grep total
20M total

gives me the total usage of my .jpg files in this directory.

To deal with multiple directories you'd probably have to combine this with find somehow.

You might find du command examples useful (it also includes find)

Levon
  • 11,384
  • 4
  • 45
  • 41
  • 5
    This doesn't traverse the underlying directories? – mbaitoff Jun 26 '12 at 05:48
  • 3
    This is easier to type than the accepted solution, but is only half-right, it won't include images in subdirectories. Good to know if all the files are in one directory. – gbmhunter Aug 29 '19 at 19:56
  • @gbmhunter I think if you add the -R parameter to -ch you will also get the subdirectories as it recursively traverses the directory tree. I'm not currently at a computer to try it out though to confirm. – Levon Aug 29 '19 at 23:04
  • 1
    I don't see an -R option at http://man7.org/linux/man-pages/man1/du.1.html. And I don't think a recursive option would help in this case because the shell is doing the glob expansion before passing the arguments to du. – gbmhunter Aug 30 '19 at 21:56
  • 3
    To get images in subdirectories, couldn't you use **/*.jpg? – Kyle Barron Nov 26 '19 at 17:30
  • Even for du -hc *.gz it's giving me -bash: /usr/bin/du: Argument list too long error – Matěj Račinský Dec 05 '19 at 12:22
  • the same here: invert grep to work in all languages: ...|grep -v .jpg$ – iRaS Oct 13 '20 at 06:55
  • Note that this throws things off if you have total in a file name. Might make more sense to replace | grep total with | tail -n1 – aggregate1166877 Jun 03 '23 at 06:15
51

Primarily, you need two things:

du -ch -- **/*.jpg | tail -n 1
  • 2
    very good reply. Simpler than using find (as long * or ** matches the directory structure) – Andre de Miranda Apr 21 '16 at 05:13
  • It can also handle very long lists of files whereas using find can return erroneous results. – Eric Fournie Oct 19 '16 at 08:50
  • 1
    bash brace expansion allows for measuring multiple sets of wildcards too. du -ch -- ./{dir1,dir2}/*.jpg or du -ch -- ./{prefix1*,prefix2*}.jpg – J.Money Jul 23 '19 at 22:24
  • 3
    @EricFournie However I got Argument list too long error when processing about 300k text files. – xtluo Aug 01 '19 at 07:43
  • The maximum number of arguments for a command (in this case, the file names returned by the wildcard expansion) can be checked with getconf ARG_MAX. If you have more, you will need to process the files one by one or batchwise with a for loop. – Eric Fournie Aug 01 '19 at 08:09
  • I was getting "No such file or directory" errors when using the **/* glob, but then spotted the link you had to globstar in your answer. Works like a charm! – Rob Oct 01 '19 at 08:26
  • +1 for -c. I can't imagine how I forgot that. Fortunately here in the future we can use --total now. which I probably won't remember either... – Jim Feb 27 '23 at 14:33
45

The ultimate answer is:

{ find <DIR> -type f -name "*.<EXT>" -printf "%s+"; echo 0; } | bc

and even faster version, not limited by RAM, but that requires GNU AWK with bignum support:

find <DIR> -type f -name "*.<EXT>" -printf "%s\n" | gawk -M '{t+=$1}END{print t}'

This version has the following features:

  • all capabilities of find to specify the files you're looking for
  • supports millions of files
    • other answers here are limited by the maximum length of the argument list
  • spawns only 3 simple processes with a minimal pipe throughput
    • many answers here spawn C+N processes, where C is some constant and N is the number of files
  • doesn't bother with string manipulation
    • this version doesn't do any grepping, or regexing
    • well, find does a simple wildcard matching of filenames
  • optionally formats the sum into a human-readable form (eg. 5.5K, 176.7M, ...)
    • to do that append | numfmt --to=si
rindeal
  • 778
  • 1
    I like the simplicity of this answer, although it only worked for me when I introduced spaces after the opening brace and before the closing brace. I do wonder if it will really support an 'infiinte' number of files though :) – andyb Feb 07 '17 at 00:29
  • 1
    @andyb thanks for the feedback, the spaces around braces are indeed required in BASH, I'm using ZSH so I didn't notice that. And the number of files is limited by the available RAM on your system as bc's memory usage grows slowly as the numbers flow in. – rindeal Feb 07 '17 at 17:31
17

The answers given until now do not take into account that the file list passed from find to du may be so long that find automatically splits the list into chunks, resulting in multiple occurences of total.

You can either grep total (locale!) and sum up manually, or use a different command. AFAIK there are only two ways to get a grand total (in kilobytes) of all files found by find:
find . -type f -iname '*.jpg' -print0 | xargs -r0 du -a| awk '{sum+=$1} END {print sum}'

Explanation
find . -type f -iname '*.jpg' -print0: Find all files with the extension jpg regardless of case (i.e. *.jpg, *.JPG, *.Jpg...) and output them (null-terminated).
xargs -r0 du -a: -r: Xargs would call the command even with no arguments passed, which -r prevents. -0 means null-terminated strings (not newline terminated).
awk '{sum+=$1} END {print sum}': Sum up the file sizes output by the previous command

And for reference, the other way would be
find . -type f -iname '*.jpg' -print0 | du -c --files0-from=-

Jan
  • 7,772
  • 2
  • 35
  • 41
4

If the list of files is too big that it can't be passed to a single invocation of du -c, on a GNU system, you can do:

find . -iname '*.jpg' -type f -printf '%b\t%D:%i\n' |
  sort -u | cut -f1 | paste -sd+ - | bc

(size expressed in number of 512 byte blocks). Like du it tries to count hard links only once. If you don't care about hardlinks, you can simplify it to:

(printf 0; find . -iname '*.jpg' -type f -printf +%b) | bc

If you want the size instead of disk usage, replace %b with %s. The size will then be expressed in bytes.

3

The solutions mentioned so far are inefficient (exec is expensive) and require additional manual work to sum if the file list is long or they don't work on Mac OS X. The following solution is very fast, should work on any system, and yields the total answer in GB (remove a /1024 if you want to see the total in MB): find . -iname "*.jpg" -ls |perl -lane '$t += $F[6]; print $t/1024/1024/1024 . " GB"'

  • Neither -iname nor -ls are standard/portable, so it won't work on any system either. It will also not work properly if there are filenames or symlink targets that contain newline characters. – Stéphane Chazelas Jun 22 '16 at 08:28
  • Also note that it gives the sum of the file sizes, not their disk usage. For symlinks, it gives the size of the symlinks, not the files they point to. – Stéphane Chazelas Jun 22 '16 at 08:31
2

Improving SHW's great answer to make it work with any locale, like Zbyszek already pointed out in his comment:

LC_ALL=C find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$
lbo
  • 121
2

du naturally traverses the directory hierarchy and awk can perform the filtering so something like this may be sufficient:

du -ak | awk 'BEGIN {sum=0} /\.jpg$/ {sum+=$1} END {print sum}'

This works without GNU.

GeoffP
  • 21
2

This is what worked for me.

find -type f -iname *.jpg -print0 | du -ch --files0-from=- | grep total$
2

Using the modern fd (AKA fd-find or fdfind on Ubuntu)

fdfind -e jpg -X du -ch | tail -1

I found fd easier to work with then find, and no need to enable globstar

The trick is to use the uppercase X --exec-batch that executes the command just once and not the lowercase x, which does a normal exec running on every file.

To install on ubuntu:

sudo apt install fd-find

See more

Janghou
  • 403
0

Another would be

ls -al <directory> | awk '{t+=$5}END{print t}}'

Assuming you're looking in a single directory. If you want to look at the current directory and beneath that

ls -Ral <directory> | awk '{t+=$5}END{print t}}'
Paulo Tomé
  • 3,782
  • (1) Biggest problem: This looks at everything, but the question is specifically about restricting the search to a subset of files; e.g., *.jpg.  (And the question explicitly says that the OP wants to do a recursive directory search.)  (2) This will count, not only files with non-matching names (e.g., *.gif, *.png, etc.), but also non-files; e.g., directories and symbolic links.  (3) This can produce incorrect results if any filename(s) contain newline(s).  (4) Like some of the (poorer) answers, this counts hard links multiple times.  … (Cont’d) – Scott - Слава Україні Mar 09 '20 at 17:37
  • (Cont’d) …  Hint: When a question is almost 8 years old and has 9 answers, it's quite possible that all the good answers have already been given, and you should think long and hard about whether you really have something new and better to contribute. – Scott - Слава Україні Mar 09 '20 at 17:37
0

Other alternative using stat rather than du

stat -L -c %s ** | awk '{s+=$1} END {printf "%.0f\n", s}'

See Gilles answer about using **

0

This is a mashup of several answers and comments that do what I need.

find . \( -iname "*.jpg" -o -iname "*.png" \) -type f -exec du -bc {} + | grep total$ | cut -f1 | awk '{ total += $1 }; END { print total }'| numfmt --to=iec

  • find will get all the files recursively
  • -iname is for case INsensitive
  • -o and parenthesis to look for multiple patterns
  • du -bc will get the files' size, sometimes in more than one call if there are many files
  • grep total will get only the total line as given by du
  • cut -f1 will take only the actual integer values
  • awk will sum them all
  • numfmt will convert it to a human-readable format
Gabriel
  • 101