4

I am trying to get the total size of files satisfying a find, e.g.:

ls  $(find -maxdepth 2 -type f)

However, this kind of invocation of ls does not produce the total size as well.

Marcus Junius Brutus
  • 4,587
  • 11
  • 44
  • 65
  • many of the below solution hope that the filenames can "fit" on a du -ch line, which may not be the case. So it would then print several partial results instead of the global sum. See my solution for an alternative which should work in "all" cases (and is portable as it doesn't depend on GNU find, GNU xargs, and not even on recent find options such as "printf") – Olivier Dulac Dec 05 '13 at 14:48
  • Those other solution of course are neat, though, and will work in many cases (the commandline can be huge, especially on recent systems!). But "ymmv", and huge directories occur too... – Olivier Dulac Dec 05 '13 at 14:51

6 Answers6

7

Believe it or not you can do this with find and du. I used a similar technique that I wrote up on my blog a while a go. That article is titled: [one-liner]: Calculating Disk Space Usage for a List of Files Using du under Linux.

The gist of that post is a command such as this:

$ find -maxdepth 2 -type f | tr '\n' '\0' | du -ch --files0-from=-

Example

This will list the size of all the files along with a summary total.

$ find -maxdepth 2 -type f | tr '\n' '\0' | du -ch --files0-from=- | tail -10
0   ./92086/2.txt
0   ./92086/5.txt
0   ./92086/14.txt
0   ./92086/19.txt
0   ./92086/18.txt
0   ./92086/17.txt
4.0K    ./load.bash
4.0K    ./100855/plain.txt
4.0K    ./100855/tst_ccmds.bash
21M total

NOTE: This solution requires that du support the --files0-from= switch which is a GNU switch, to my knowledge.

excerpt from du man page

--files0-from=F
          summarize disk usage of the NUL-terminated file names specified in 
          file F; If F is - then read names from standard input

Also this method suffers from not being able to deal with special characters in file names, such as spaces and non-printables.

Examples

du: cannot access `./101415/fileD': No such file or directory
du: cannot access `E': No such file or directory

These could be dealt with by introducing more tr .. .. commands to substitute them with alternative characters. However there is a better way, if you have access to GNU's find.

Improvements

If your version of find offers the --print0 switch then you can use this incantation which deals with files that have spaces and/or special characters that aren't printable.

$ find -maxdepth 2 -type f -print0 | du -ch --files0-from=- | tail -10
0   ./92086/2.txt
0   ./92086/5.txt
0   ./92086/14.txt
0   ./92086/19.txt
0   ./92086/18.txt
0   ./92086/17.txt
4.0K    ./load.bash
4.0K    ./100855/plain.txt
4.0K    ./100855/tst_ccmds.bash
21M total
slm
  • 369,824
5

du (disk usage) count the space files take up. Pass your found files to it and direct it to summarize (-c) and print in a human readable format (-h) instead of byte counts. Yo will then get the sizes of all the files concluded with a grand total. If you are only interested in this last line, you can then tail for it.

To also handle spaces in filenames, the delimiting symbol that find prints and xargs expects is set to the null symbol instead of the usual space.

find -maxdepth 2 -type f -print0 | xargs -0 du -ch | tail -n1

If you expect to find many files which burst the number of maximum arguments, xargs will split these into multiple du invocations. Then you could work around with replacing tail with a grep, that only shows the summarizing lines.

find -maxdepth 2 -type f -print0 | xargs -0 du -ch | grep -P '\ttotal$'
XZS
  • 1,448
  • 1
    +1 It will fail for files with a space in their name, though. – Joseph R. Dec 05 '13 at 14:31
  • It will fail too if the list of files is extremely long. You're literally expanding the entire list of files from the $(find ..) command, and passing it as args to du -ch. But in a pinch this is completely usable! Also suffers from spaces and non-printables. – slm Dec 05 '13 at 14:40
  • It now cares for spaces and there is also a hint for an overwhelming argument list. – XZS Dec 05 '13 at 15:04
3

Another approach : we just need the file size, and don't care about the file names, so we can get rid of any "weird" file names such as "names with CR in them, names with spaces, etc" :

 find /some/path -maxdepth 2 -type f -ls -exec printf '\000' \; \
     | tr -cd ' -~\000' \
     | tr '\000' '\n'   \
     | awk  '{ sum+=$7 } END { print "total size: ",sum }'

The trick is:

1) we print each file's "-ls" output, FOLLOWED by a "\000" caracter (on the next line, but it's not a problem, see step 2)
2) we get rid of everything 'non-ascii-printable' (including '\t' and '\n'. But we do keep also the \000 in addition to the "regular" printable ascii, as we need it to know where the line of each file ends!). That way, filenames don't have anymore any quirks in them (no '\n', no '\t', no ';', etc). We do keep the spaces too, as we need those as well to find out the 7th field of "-ls", ie the filesize
3) we translate the added '\000' into a '\n' (step 2) got rid of those too, in case some filenames contained them as well!)
4) then we add the 7th column to get the final size in bytes.

This is very portable (don't need "-print0", etc)

  • I do it this way to 1) avoid limitation on the number of filenames 2) avoid any problem with any kind of filenames (they can not contain a '\000' by design) 3) portability: on many systems you don't have GNU find but a legacy one [mine doesn't even have -printf .... otherwise I could simply just output the filesize only...] – Olivier Dulac Dec 05 '13 at 14:19
  • the steps 1), 2), 3) and 4) also correspond to the different lines in the command – Olivier Dulac Dec 05 '13 at 14:50
  • The -maxdepth option isn't required by POSIX. – James Youngman Dec 09 '13 at 23:08
  • @JamesYoungman: I was taking the OP's options to better reflect his/her needs. But I don't have it on some of my systems, indeed. apart from that option, the rest should work on "any" unix system. – Olivier Dulac Dec 10 '13 at 09:14
2

If you're only going to compute the size of maximum two directory levels, why not call du directly?

du -ch dir/* dir/*/* | tail -1

This makes the shell expand the two levels of directories to a list of names and passes them as arguments to du which computes the sum.

1

find -maxdepth 2 -type f --print0 | xargs -0 du -ch

  • 2
    if the list of files is longer than the commandline allows, xargs will be called multiple times and you will have a grand total for each invocation. – Anthon Dec 05 '13 at 13:04
  • Which is still more scalable than my approach, which will simply fail, when there are more files than possible arguments. – XZS Dec 05 '13 at 13:14
  • 2
    This will not work if there are any spaces or newlines in the filenames. You should use ... --print0 | xargs -0 du ... – Zelda Dec 05 '13 at 13:47
0

This is a simple way that handles whatever odd file names that can be found:

find . -maxdepth 2 -type f -exec du -ch {} + | grep -w "total"

If there is a really large number of files under the current directory, you might have more than one total line displayed. There might be also unwanted total lines if some file names contain an isolated "total", eg: a file named "Grand total file.txt"

jlliagre
  • 61,204