In the context of e.g. ext4 fs, I often stumble upon both 'fragment' and 'extent' words.
They appear to be both related to fragmentation, extents for example are pretty extensively (duh) explained here.
What's the difference between them?
Just to add confusion, there are two kinds of fragments and three kinds of fragmentation.
A fragmented file is a file that is stored in multiple chunks (or fragments, each of which might be an extent), so that when the file is read sequentially, the OS has to read all the pieces from different places on the disk, which can slow down the reading of the file. ext4fs has algorithms to try to prevent this by allocating blocks for the file as contiguously as possible. A file with multiple extents is said to be fragmented. However, the file could also be stored as a single fragment (i.e., not fragmented) that is a list of contiguous blocks and is not an extent.. so that is how fragments and extents are related.
A fragment can also be a file that is smaller than a block or the last piece of a file that is smaller than a block that is stored as a fragment in a full block with other fragments. When block sizes are large and you have a large number of small files, storing multiple file fragments in a single block saves a lot of space and possibly increases performance, especially if you are trying to read all of the files that share a block.
From the filesystem perspective, you can have internal fragmentation and external fragmentation.
Internal fragmentation is within a single file, same as in the first type of fragmentation above.
External fragmentation occurs when you have related files all in one directory that are scattered all over the disk. If you are trying to read every file in the directory, this can case as much performance issues as internal fragmentation does. The ext4fs algorithms also attempt to minimize external fragmentation by attempting to allocate blocks for files in the same cylinder group as other files in the same directory.
Historical note:
Linux ext4fs is an evolution of ext3fs and ext2fs and extfs. extfs was patterned on (but not based on) UFS. UFS was based on the BSD Fast File System (FFS). The algorithms above were key in FFS and largely exist in ext4 but many others in FFS (especially those dealing with rotational latency) are obsolete and were probably thrown away for UFS. Some of the terminology above is (at least) two generations from linux and the terms themselves may have drifted, but the algorithms are still there.
Cylinder groups from FFS were originally correlated with tracks and sectors (which were used for rotational latency optimization), but were really just contiguous cylinders. Blocks haven't cleanly mapped to tracks/sectors for at least 30 years, and probably not since floppies were common. But cylinder groups are still contiguous groups of blocks and still help optimize seek times. Seek times may be irrelevant for solid state disks, but contiguous block allocation still helps them.
Optimization from cylinder groups is almost a side effect (but an intentional one), as inodes belong to a cylinder group, and inodes within a directory have a tendency to be allocated sequentially, and there is an attempt to allocate blocks for an inode in the same cylinder group.
Note that cylinder groups are not the only algorithm used to reduce performance issues from fragmentation. Even FFS tried to delay block allocation to try to get larger contiguous block allocations within a file.
The use of the term 'fragmentation' has always been used in a sloppy way in unix with little or no clarity of which type of fragmentation was being referred to without carefully reading context. After searching and referencing multiple historical documents for both BSD and Linux, I was unable to find any formal definition of the words internal or external fragmentation and their usage seems to have never been fixed, and the term 'fragmentation' itself sometimes is used to reference both types at the same time.
You already cited a source that explains what an extent is.
As for a fragment, it's a subdivision of a block. This concept was introduced in FFS and still exists in UFS, but it doesn't exist in EXT4.
Sources :
As files are created or expanded, they are allocated disk space in either full logical blocks or portions of logical blocks called fragments. When disk space is needed for a file, full blocks are allocated first, and then one or more fragments of a block are allocated for the remainder. For small files, allocation begins with fragments.
The ability to allocate fragments of blocks to files, rather than just whole blocks, saves space by reducing fragmentation of disk space that results from unused holes in blocks.
You define the fragment size when you create a UFS file system. The default fragment size is 1 KB. Each block can be divided into 1, 2, 4, or 8 fragments, which results in fragment sizes from 8192 bytes to 512 bytes (for 4-KB file systems only). The lower bound is actually tied to the disk sector size, typically 512 bytes.
For multiterabyte file systems, the fragment size must be equal to the file system block size.
https://recoverhdd.com/blog/ufs1-and-ufs2-file-systems.html
The main purpose of FFS was to consolidate all the contents of a directory (data and metadata) into one cylinder group. It would greatly reduce the fragmentation level that occurred due to the severe spread of data across the disk’s surface. However, due to the rapid increase in the disk size and the size of the files stored on them, this solution was no longer effective since the block size was increased to keep the performance at the proper level. Accordingly, storing a large number of small files took up a lot of space.
It again forced the developers to develop the file system and based on FFS was created the revised file system “UFS1“, and later its revised version – “UFS2“, creation of which allowed providing reliability and speed thanks to the division of blocks into fragments, which are used to store the final bytes of the file (previously for this was allocated a whole block) and some new technologies.
Again, the concept doesn't exist in EXT3/4 :
What is a fragment size in an ext3 filesystem?
ext3fs doesn't support block fragmentation so a one byte file will use a whole 4096 block.
On the opposite, for example UFS supports four fragments in a block so small files won't fill a file system as fast as they will do on ext3fs.
Optimize ext4 partition for millions of 1KB files
AFAIK, ext4 is simply not a good choice for what you're doing, since it doesn't support block sub-allocation. You should really consider using UFS2 or BtrFS.
Note that # dumpe2fs
of an EXT3/4 FS will show you a fragment size equal to the block size :
# dumpe2fs /dev/sde1
(...)
Block size: 4096
Fragment size: 4096
(...)
directory fragmentation
). – ChennyStar Jan 03 '24 at 05:12directory fragmentation
would conflict with reducing file fragmentation (i'm using the termdirectory fragmentation
instead ofexternal fragmentation
, which has a different meaning, according to kernel.org https://ext4.wiki.kernel.org/index.php/Design_for_Large_Allocation_Blocks). – ChennyStar Jan 03 '24 at 05:18external fragmentation
for file fragmentation, vsinternal fragmentation
for the case where file size < block size. The way kernel.org uses those terms seems to confirm that, without defining them explicitly (https://ext4.wiki.kernel.org/index.php/Design_for_Large_Allocation_Blocks). – ChennyStar Jan 03 '24 at 08:14