This is because files use up space in whole-block increments. So if your block size is 512 bytes and you have a small 100 byte file, the size it actually uses up will be rounded up to the nearest block - in this case 512. When tarring, because the result is a single file, that inefficiency is reduced since there is only one resultant file - the .tar file.
You can really see this in action if you create 100 small files and see their size as individual files vs. combined together. Running the following commands will create a directory with 100 single-byte files and then compare the size of them individually vs. all combined into one vs. a tarball created from them.
mkdir tmp_small_file_test
for ((i=0; i<100; i++)); do head -c 1 /dev/zero > tmp_small_file_test/file$i; done
du -sh tmp_small_file_test
#on a 4096 byte block size filesystem this output 404K
cat tmp_small_file_test/file* >> tmp_small_file_test/all_files_combined
du -sh tmp_small_file_test/all_files_combined
#this output 4.0K
rm -f tmp_small_file_test/all_files_combined
tar -cf tmp_small_file_test.tar tmp_small_file_test
du -sh tmp_small_file_test.tar
#this output 116K
NOTE: since tar
has some overhead to store each file in a tarball, if you tar up the above directory the tar file isn't as small as all the files combined together, but it's still a lot smaller than the files by themselves (at least on a filesystem with block size 4096).
If you're using an ext3/ext4 filesystem, you can see the block size using something like tune2fs -l /dev/sda1 |grep -i 'block size'
(replace /dev/sda1 which the filesystem you're using). This should work out to the first du
above divided by 100.
cat sampledir/* | wc -c
? – Mark Plotnick Jul 27 '15 at 21:4419361
. – BowPark Jul 28 '15 at 18:39