The du
, by default, measure file size in 'blocks'. So each small file (which is smaller than a block), takes as much of the block as needed, and the rest is empty. But it cannot be used by another file (since a block can belong to just one file). And therefore you have some amount of bytes 'wasted'.
The tar
on the other hand, concatenates all files. Much less 'wasted' space.
You can use key -b
for du
if you want to see a better prediction of the tar size.
Meaning if you run
$ du -shb /etc
$ du -shb etc.tar
You will get numbers much closer in size to each other.
The difference will come from file's descriptions. A size of the directory in the first case and size of tar header in the second.
To investigate it farther, you can start with:
$ df /some_test_dir
That will tell you which physical device that directory is located (column Filesystem)
$ sudo /sbin/dumpe2fs /dev/?? |grep 'Block size'
Define your device here and you will get a size of the block on that device.
If you do du /some_test_dir
and that dir is empty - you will get a block size.
If you now put a file (or many files) all of which would be a zero-length, then du
on the directory will still give a block size - that is because the files are not taking any space at all, and the information about them is stored inside the directory's block.
For the next test, create in this directory N files each of them less in size than a block. Actual size does not matter, it have to be more than zero, less than a block. Now du
on the directory will give you (N+1)*block
. Here each file will take a block and a directory itself takes a block.
If you would have many files (how many depends on file system) then the directory itself can grow in size in order to store information of files in it. But the directory size will always be a multiple of a block size.