-1

When I'm creating a tar archive of a directory and all its content, the archive's size is more than the combined size of all original files.

Why is it so?

I'm checking file sizes by using ls -l.

I'm creating the archive using tar -cvf archive directory.

Kusalananda
  • 333,661

2 Answers2

4

tar records the name and other metadata about your stored files inside the archive — this information alone requires a number of bytes of storage per file.

There can also be a fair amount of empty space inside a tar file, presumably due to blocking. In a couple of narrow tests, I stored a file of zero bytes and gained a tar file of 2560 bytes; 1000 zero byte files generated an archive of ~1.5MB.

As you are using tar without a compression flag (such as z) you are not going to get anything other than a larger file.

bxm
  • 4,855
  • A couple of other factors: (a) Tar reblocks disk files (maybe block size 4096) into tape-sized blocks (size 512). So an 1800-byte file is 4096 actual disk space, reported as 1800 in ls, and 2048 on tar. (b) Some tars do buffered-writes to tar files, and write 10240-byte units, which were optimal for some actual tape drives. That does not happen for every file, just at the end of the archive. – Paul_Pedant Dec 16 '19 at 15:54
2

The historic TAR from 1977 records 512 bytes of meta data together with each file.

If you archive a file with a size of 500 bytes, this more than doubles the space needed in the archive compared to the plain file content.

If you compare this to the overhead for files in a filesystem, this is typically still less than what the filesystem needs as whole space for the file.

BTW: In 1997, Solaris introduced a new enhanced TAR archive format. This format has been standardized with POSIX.1-2001. It is called pax or tar with extended headers.

This tar with extended headers supports to archive time stamps with arbitrary resolution and filenames with arbitrary length. A TAR archive with extended headers needs an overhead of at least 1536 bytes per file. This is still not more than the overhead of a typical filesystem, as filesystems need inode information, the directory entry, ACL and other enhanced meta data and typically round up the file size to 1..8 kBytes when storing the file content inside the list of blocks of the filesystem background storage.

schily
  • 19,173