2

The ext4 file system usually uses 4 KiB blocks. In this way when you write a small file, and it's size is less that 4 KiB, you will see the difference in any file manager. There are usually two values: size of the file and size on disk. The fist one has the right value, and the other is multiplication of the 4 KiB.

In the case of larger files, I've always thought that the size can't differ more than 4 KiB (the last, not fully written block). But in the case of some files on my disk, I can see that the difference is more than 4 KiB, for instance 9425 bytes. So the question is simple, why the sizes differ more than 4 KiB. Is it because of fragmentation or something else? Isn't it weird that some blocks in the middle of the file aren't fully written?

2 Answers2

4

The list of blocks that make up the file has to be stored somewhere. Typically there's a little space in the inode, but if there are too many blocks to fit in the inode, the filesystem allocates indirect blocks to store the address of the blocks, in addition to the blocks that contain file data. At least for ext2/ext3/ext4 on Linux, and I think for most Unix-like filesystems on most Unix-like operating systems, the indirect blocks are taken into account in the file's disk usage.

Ext4 uses extent trees to store block lists. If a file uses a list of consecutive blocks in order, this takes up a single entry in the tree. Thus a file with little fragmentation doesn't need any indirect blocks, just one entry in the tree that specifies the first block and the number of blocks. A maximally fragmented file needs a lot of indirect blocks to store one tree entry per block. If the file is not fragmented or only very slightly then no indirect block is needed and the file's disk usage is the file size rounded up to a whole number of filesystem blocks. Fragmented files require indirect blocks.

Ext2 and ext3 have a simpler scheme where the block list is not compressed so the number of entries scales slightly more than linearly with the size of the file, requiring indirect blocks if the file uses more than 12 blocks (that's how many blocks can be recorded directly in the inode).

You can explore an ext2/ext3/ext4 filesystem with the debugfs command. In debugfs, blocks /path/to/file lists the blocks used by a file; this shows how fragmented the file is. The command filefrag /path/to/file gives the number of fragments; for ext4 this correlates with the number of indirect blocks and hence with the difference between file size and file disk usage.

0

I think it might happen in this situation:

If the FALLOC_FL_KEEP_SIZE flag is specified in mode, the behavior of the call is similar, but the file size will not be changed even if offset+len is greater than the file size. Preallocating zeroed blocks beyond the end of the file in this manner is useful for optimizing append workloads.

sourcejedi
  • 50,249