12

Can someone tell me the differences from:

du -s dir 
3705012 dir

du -s --apparent-size dir
3614558 dir

these dirs are inside a block device (created using cryptsetup). Or better: why I need add --apparent-size only with files inside a crypted block device?

Pol Hallen
  • 2,157
  • 8
  • 23
  • 31
  • 1
    This answer from stackoverflow shall help you: http://stackoverflow.com/a/5694854/2231796 –  Dec 12 '14 at 15:41

2 Answers2

17

The "apparent size" of a file is how much valid data is actually in the file. It is the actual amount of data that can be read from the file. Block-oriented devices can only store in terms of blocks, not bytes. As a result, the disk usage is always rounded up to the next highest block. A "block" in this case may not equate to a physical block on the storage device, either, depending on how the file system allocates space.

In the case of your encrypted device, the file system may expand the amount of space used to include overhead to support the encryption/decryption information. It probably also encrypts or randomizes the unused space between the end of file and the end of the block containing it, which may make it appear larger to du.

None of this takes into account sparse file handling, which may not be supported in an encrypted filesystem.

9

Minimal block granularity example

Let's play a bit to see what is going on.

mount tells me I'm on an ext4 partition mounted at /.

I find its block size with:

stat -fc %s .

which gives:

4096

Now let's create some files with sizes 1 4095 4096 4097:

#!/usr/bin/env bash
for size in 1 4095 4096 4097; do
  dd if=/dev/zero of=f bs=1 count="${size}" status=none
  echo "size     ${size}"
  echo "real     $(du --block-size=1 f)"
  echo "apparent $(du --block-size=1 --apparent-size f)"
  echo
done

and the results are:

size     1
real     4096   f
apparent 1      f

size     4095
real     4096   f
apparent 4095   f

size     4096
real     4096   f
apparent 4096   f

size     4097
real     8192   f
apparent 4097   f

So we see that anything below or equal to 4096 takes up 4096 bytes in fact.

Then, as soon as we cross 4097, it goes up to 8192 which is 2 * 4096.

It is clear then that the disk always stores data at a block boundary of 4096 bytes.

What happens to sparse files?

I haven't investigated what is the exact representation is, but it is clear that --apparent does take it into consideration.

This can lead to apparent sizes being larger than actual disk usage.

For example:

dd seek=1G if=/dev/zero of=f bs=1 count=1 status=none
du --block-size=1 f
du --block-size=1 --apparent f

gives:

8192    f
1073741825      f

Related: https://stackoverflow.com/questions/38718864/how-to-test-if-sparse-file-is-supported

What to do if I want to store a bunch of small files?

Some possibilities are:

Bibliography:

Tested in Ubuntu 16.04.

Ciro Santilli OurBigBook.com
  • 18,092
  • 4
  • 117
  • 102