Find the total size of certain files within a directory branch

Question

Assume there's an image storage directory, say, ./photos/john_doe, within which there are multiple subdirectories, where many certain files reside (say, *.jpg). How can I calculate a summary size of those files below the john_doe branch?

I tried du -hs ./photos/john_doe/*/*.jpg, but this shows individual files only. Also, this tracks only the first nest level of the john_doe directory, like john_doe/june/, but skips john_doe/june/outrageous/.

So, how could I traverse the entire branch, summing up the size of the certain files?

score 312 · Accepted Answer · edited Jan 17 '15 at 18:36

312

find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$

If more than one invocation of du is required because the file list is very long, multiple totals will be reported and need to be summed.

edited Jan 17 '15 at 18:36

Hauke Laging

90,279

answered Jun 25 '12 at 06:15

SHW

14,786
14
66
101

16

find -iname 'file*' -exec du -cb {} + | grep total$ | cut -f1 | paste -sd+ - | bc
summed byte size
– Michal Čizmazia Jul 15 '15 at 13:55
4

If your system works under other language then you need to change total$ to other word like razem$ in Polish. – Zbyszek Jul 26 '15 at 12:49
2

You can add LC_ALL=POSIX as prefix to always grep for total like this: LC_ALL=POSIX find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$ – Sven Jun 27 '16 at 05:48
2

If you're not using -name, then change the grep to grep -P "\ttotal$" or else it will capture all files ending with "total" as well. – thdoan Mar 30 '17 at 07:43
4

@MichalČizmazia some shells (e.g., Git Bash for Windows) don't come with bc, so here is a more portable solution: find -name '*.jpg' -type f -exec du -bc {} + | grep total$ | cut -f1 | awk '{ total += $1 }; END { print total }' – thdoan Mar 30 '17 at 07:55
1

why not invert grep so it works for all languages? ...|grep -v .jpg$ – iRaS Oct 13 '20 at 06:53
3

What does the + do at the end of the find command? I couldn't find any mention of it in man find. – localhost Jan 01 '21 at 14:31
When totals come with units at end bc does not operate well, it is better to use: find -iname 'file*' -exec du -cb {} + |grep -e "total$" |cut -f1 |paste -sd+ - |bc |numfmt --to=iec --suffix=B --round=towards-zero – Jester May 17 '21 at 06:59
This is the most portable, flexible, Unix-like answer that gets close. Every other answer either doesn't answer the question or works only with bash or Linux or GNU find. – Cliff Aug 31 '22 at 04:23
@localhost: https://stackoverflow.com/a/6085237/785194 – EML Nov 20 '23 at 16:21

Levon · Answer 2 · 2012-06-25T04:57:34.463

107

du -ch public_html/images/*.jpg | grep total
20M total

gives me the total usage of my .jpg files in this directory.

To deal with multiple directories you'd probably have to combine this with find somehow.

You might find du command examples useful (it also includes find)

edited Jun 25 '12 at 04:57

answered Jun 25 '12 at 04:40

Levon

11,384
4
45
41

5

This doesn't traverse the underlying directories? – mbaitoff Jun 26 '12 at 05:48
3

This is easier to type than the accepted solution, but is only half-right, it won't include images in subdirectories. Good to know if all the files are in one directory. – gbmhunter Aug 29 '19 at 19:56
@gbmhunter I think if you add the -R parameter to -ch you will also get the subdirectories as it recursively traverses the directory tree. I'm not currently at a computer to try it out though to confirm. – Levon Aug 29 '19 at 23:04
1

I don't see an -R option at http://man7.org/linux/man-pages/man1/du.1.html. And I don't think a recursive option would help in this case because the shell is doing the glob expansion before passing the arguments to du. – gbmhunter Aug 30 '19 at 21:56
3

To get images in subdirectories, couldn't you use **/*.jpg? – Kyle Barron Nov 26 '19 at 17:30
Even for du -hc *.gz it's giving me -bash: /usr/bin/du: Argument list too long error – Matěj Račinský Dec 05 '19 at 12:22
the same here: invert grep to work in all languages: ...|grep -v .jpg$ – iRaS Oct 13 '20 at 06:55
Note that this throws things off if you have total in a file name. Might make more sense to replace | grep total with | tail -n1 – aggregate1166877 Jun 03 '23 at 06:15

Gilles 'SO- stop being evil' · Answer 3 · 2019-10-01T17:33:50.110

51

Primarily, you need two things:

the -c option to du, to tell it to produce a grand total;
either ** (activation instructions) or find (example) or to traverse subdirectories.

du -ch -- **/*.jpg | tail -n 1

edited Oct 01 '19 at 17:33

answered Jun 26 '12 at 01:06

Gilles 'SO- stop being evil'

829,060

2

very good reply. Simpler than using find (as long * or ** matches the directory structure) – Andre de Miranda Apr 21 '16 at 05:13
It can also handle very long lists of files whereas using find can return erroneous results. – Eric Fournie Oct 19 '16 at 08:50
1

bash brace expansion allows for measuring multiple sets of wildcards too. du -ch -- ./{dir1,dir2}/*.jpg or du -ch -- ./{prefix1*,prefix2*}.jpg – J.Money Jul 23 '19 at 22:24
3

@EricFournie However I got Argument list too long error when processing about 300k text files. – xtluo Aug 01 '19 at 07:43
The maximum number of arguments for a command (in this case, the file names returned by the wildcard expansion) can be checked with getconf ARG_MAX. If you have more, you will need to process the files one by one or batchwise with a for loop. – Eric Fournie Aug 01 '19 at 08:09
I was getting "No such file or directory" errors when using the **/* glob, but then spotted the link you had to globstar in your answer. Works like a charm! – Rob Oct 01 '19 at 08:26
+1 for -c. I can't imagine how I forgot that. Fortunately here in the future we can use --total now. which I probably won't remember either... – Jim Feb 27 '23 at 14:33

rindeal · Answer 4 · 2018-02-14T00:31:27.090

45

The ultimate answer is:

{ find <DIR> -type f -name "*.<EXT>" -printf "%s+"; echo 0; } | bc

and even faster version, not limited by RAM, but that requires GNU AWK with bignum support:

find <DIR> -type f -name "*.<EXT>" -printf "%s\n" | gawk -M '{t+=$1}END{print t}'

This version has the following features:

all capabilities of find to specify the files you're looking for
supports millions of files
- other answers here are limited by the maximum length of the argument list
spawns only 3 simple processes with a minimal pipe throughput
- many answers here spawn C+N processes, where C is some constant and N is the number of files
doesn't bother with string manipulation
- this version doesn't do any grepping, or regexing
- well, find does a simple wildcard matching of filenames
optionally formats the sum into a human-readable form (eg. 5.5K, 176.7M, ...)
- to do that append | numfmt --to=si

edited Feb 14 '18 at 00:31

answered Nov 12 '16 at 16:54

rindeal

778

1

I like the simplicity of this answer, although it only worked for me when I introduced spaces after the opening brace and before the closing brace. I do wonder if it will really support an 'infiinte' number of files though :) – andyb Feb 07 '17 at 00:29
1

@andyb thanks for the feedback, the spaces around braces are indeed required in BASH, I'm using ZSH so I didn't notice that. And the number of files is limited by the available RAM on your system as bc's memory usage grows slowly as the numbers flow in. – rindeal Feb 07 '17 at 17:31

score 17 · Answer 5 · answered Aug 05 '14 at 10:08

17

The answers given until now do not take into account that the file list passed from find to du may be so long that find automatically splits the list into chunks, resulting in multiple occurences of total.

You can either grep total (locale!) and sum up manually, or use a different command. AFAIK there are only two ways to get a grand total (in kilobytes) of all files found by find:
find . -type f -iname '*.jpg' -print0 | xargs -r0 du -a| awk '{sum+=$1} END {print sum}'

Explanation
find . -type f -iname '*.jpg' -print0: Find all files with the extension jpg regardless of case (i.e. *.jpg, *.JPG, *.Jpg...) and output them (null-terminated).
xargs -r0 du -a: -r: Xargs would call the command even with no arguments passed, which -r prevents. -0 means null-terminated strings (not newline terminated).
awk '{sum+=$1} END {print sum}': Sum up the file sizes output by the previous command

And for reference, the other way would be
find . -type f -iname '*.jpg' -print0 | du -c --files0-from=-

answered Aug 05 '14 at 10:08

Jan

7,772
2
35
41

Additional hint: On my HDD with 23428 files (22323 being images) the first method runs 1 sec while the second one runs 3.8 secs. – Jan Aug 05 '14 at 10:12
Note that both assume a GNU system. The first one assumes file names don't contain newline characters. – Stéphane Chazelas Aug 06 '14 at 13:06
I'd bet the du --file0-from took longer because you ran it first (caching effect). – Stéphane Chazelas Aug 06 '14 at 13:07
1

With xargs, several du -a may be run, so you may have discrepancies if there are hard links. – Stéphane Chazelas Aug 06 '14 at 13:09

Stéphane Chazelas · Answer 6 · 2014-08-05T10:18:26.710

4

If the list of files is too big that it can't be passed to a single invocation of du -c, on a GNU system, you can do:

find . -iname '*.jpg' -type f -printf '%b\t%D:%i\n' |
  sort -u | cut -f1 | paste -sd+ - | bc

(size expressed in number of 512 byte blocks). Like du it tries to count hard links only once. If you don't care about hardlinks, you can simplify it to:

(printf 0; find . -iname '*.jpg' -type f -printf +%b) | bc

If you want the size instead of disk usage, replace %b with %s. The size will then be expressed in bytes.

edited Aug 05 '14 at 10:18

answered Aug 05 '14 at 09:53

Stéphane Chazelas

544,893

-bash: bc: command not found Centos - Linux 2.6.32-431.el6.x86_64 – yeya Jan 10 '18 at 11:32
1

@yeya, sounds like your CentOS deployment is broken. bc is a non-optional POSIX command. – Stéphane Chazelas Jan 10 '18 at 11:35

score 3 · Answer 7 · answered Jan 02 '16 at 00:17

3

The solutions mentioned so far are inefficient (exec is expensive) and require additional manual work to sum if the file list is long or they don't work on Mac OS X. The following solution is very fast, should work on any system, and yields the total answer in GB (remove a /1024 if you want to see the total in MB): find . -iname "*.jpg" -ls |perl -lane '$t += $F[6]; print $t/1024/1024/1024 . " GB"'

answered Jan 02 '16 at 00:17

hobbydad

31

Neither -iname nor -ls are standard/portable, so it won't work on any system either. It will also not work properly if there are filenames or symlink targets that contain newline characters. – Stéphane Chazelas Jun 22 '16 at 08:28
Also note that it gives the sum of the file sizes, not their disk usage. For symlinks, it gives the size of the symlinks, not the files they point to. – Stéphane Chazelas Jun 22 '16 at 08:31

score 2 · Answer 8 · answered Jun 22 '16 at 07:55

2

Improving SHW's great answer to make it work with any locale, like Zbyszek already pointed out in his comment:

LC_ALL=C find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$

answered Jun 22 '16 at 07:55

lbo

121

score 2 · Answer 9 · answered Jul 01 '16 at 07:58

2

du naturally traverses the directory hierarchy and awk can perform the filtering so something like this may be sufficient:

du -ak | awk 'BEGIN {sum=0} /\.jpg$/ {sum+=$1} END {print sum}'

This works without GNU.

answered Jul 01 '16 at 07:58

GeoffP

21

1

This is more expensive since it entails a stat call for files that do not correspond to the searched-for pattern. – Law29 Jul 01 '16 at 09:03
Only this solution works on my mac. – Matthias M May 26 '17 at 10:04
1

It assumes file names don't contain newline characters and that there's no directory whose name ends in .jpg. – Stéphane Chazelas Jan 22 '20 at 14:43

score 2 · Answer 10 · answered Mar 29 '20 at 15:03

2

This is what worked for me.

find -type f -iname *.jpg -print0 | du -ch --files0-from=- | grep total$

answered Mar 29 '20 at 15:03

serendrewpity

21

This is more or less a copy of another answer (apart from the trivial grep at the end). – Kusalananda Mar 29 '20 at 16:50

score 2 · Answer 11 · answered Jan 13 '23 at 16:47

Using the modern fd (AKA fd-find or fdfind on Ubuntu)

fdfind -e jpg -X du -ch | tail -1

I found fd easier to work with then find, and no need to enable globstar

The trick is to use the uppercase X --exec-batch that executes the command just once and not the lowercase x, which does a normal exec running on every file.

To install on ubuntu:

sudo apt install fd-find

score 0 · Answer 12 · edited Mar 09 '20 at 16:59

0

Another would be

ls -al <directory> | awk '{t+=$5}END{print t}}'

Assuming you're looking in a single directory. If you want to look at the current directory and beneath that

ls -Ral <directory> | awk '{t+=$5}END{print t}}'

edited Mar 09 '20 at 16:59

Paulo Tomé

3,782

answered Mar 09 '20 at 16:38

chris bird

1

(1) Biggest problem: This looks at everything, but the question is specifically about restricting the search to a subset of files; e.g., *.jpg. (And the question explicitly says that the OP wants to do a recursive directory search.) (2) This will count, not only files with non-matching names (e.g., *.gif, *.png, etc.), but also non-files; e.g., directories and symbolic links. (3) This can produce incorrect results if any filename(s) contain newline(s). (4) Like some of the (poorer) answers, this counts hard links multiple times. … (Cont’d) – Scott - Слава Україні Mar 09 '20 at 17:37
(Cont’d) … Hint: When a question is almost 8 years old and has 9 answers, it's quite possible that all the good answers have already been given, and you should think long and hard about whether you really have something new and better to contribute. – Scott - Слава Україні Mar 09 '20 at 17:37

score 0 · Answer 13 · answered Apr 12 '20 at 13:33

0

Other alternative using stat rather than du

stat -L -c %s ** | awk '{s+=$1} END {printf "%.0f\n", s}'

See Gilles answer about using **

answered Apr 12 '20 at 13:33

Peter Frost

101

score 0 · Answer 14 · answered May 26 '20 at 12:46

This is a mashup of several answers and comments that do what I need.

find . $ -iname "*.jpg" -o -iname "*.png" $ -type f -exec du -bc {} + | grep total$ | cut -f1 | awk '{ total += $1 }; END { print total }'| numfmt --to=iec

find will get all the files recursively
-iname is for case INsensitive
-o and parenthesis to look for multiple patterns
du -bc will get the files' size, sometimes in more than one call if there are many files
grep total will get only the total line as given by du
cut -f1 will take only the actual integer values
awk will sum them all
numfmt will convert it to a human-readable format

It won't work if current locale displays totals with a label that differs from "total" — mbaitoff, May 26 '20 at 15:37

Find the total size of certain files within a directory branch

14 Answers14

summed byte size

Linked