3

I'm using EXT4, but I think my question concerns all Unix/Linux file systems.

This 2022 answer to "Difference between fragment and extent in ext4" states that:

External fragmentation occurs when you have related files all in one directory that are scattered all over the disk. If you are trying to read every file in the directory, this can case as much performance issues as internal fragmentation does. The ext4fs algorithms also attempt to minimize external fragmentation by attempting to allocate blocks for files in the same cylinder group as other files in the same directory.

I wonder if that's true, because:

  1. I couldn't find any other source to corroborate that. Web search results on "external fragmentation" usually point to RAM fragmentation, while specifying "ext4 external fragmentation" brings up some answers, but usually old (<2010)

  2. An Arch Linux forum post from 2009 gives a radically different definition of "external fragmentation" in a FS context:

    There are two types of fragmentation, i.e. internal and external. Internal fragmentation refers to the fact that a file system uses specific sizes for a block, say 4KB, so if you have a file which is only 1KB in size, it will be stored in one 4KB block, therefore wasting 3KB of the block. This can't really be avoided.

    External fragmentation is when the files are not layed out continuously, i.e. spread over different blocks which can be far apart from each others. Thus it takes the disk head more time to collect all pieces together and reconstruct the file.

My opinion so far is that :

  • The previously quoted StackExchange answer from 2022 is completely wrong

  • The definition of the second quote is the right one:

    External fragmentation is when the files are not layed out continuously, i.e. spread over different blocks which can be far apart from each others.

  • And there is no such thing as "attempting to allocate blocks for files in the same cylinder group as other files in the same directory" (excerpt from the first quote). Basically, if a FS (or an OS) attempted to group files of a same directory on the disk, it would conflict with the fact that usually a FS (at least in the case of EXT4) tries to surround a file with a lot of free space, to prevent file fragmentation in case of a future expansion of the file.

Could someone please confirm that my conclusions are correct (and thus that the quoted Stack Exchange answer is wrong)?

[EDIT]

After some more research, I came to the conclusion that the terms "external" and "internal" fragmentation have never been formally defined in the context of file systems. A few sources refer to them in the sense used in this Arch Linux post from 2009 or this kernel.org wiki entry, while some (even fewer) sources refer to them like in this StackExchange post from 2022.

ChennyStar
  • 1,743
  • The part about "every file in the directory" makes no sense. (1) Directory entries point to inodes, not data blocks. (2) Consider how hard links relate to this scenario. Also, "surround a file with free space" seems suspect: it probably just tries to reserve free blocks in the same cylinder after the last block written at end to a file that is currently still open, provided that does not conflict with the First Law (Asimov, but s/human/data/g). Write-behind from cache will also have beneficial optimisation effects. – Paul_Pedant Jan 02 '24 at 11:23
  • internal/external fragmentation is mentioned here: https://ext4.wiki.kernel.org/index.php/Design_for_Large_Allocation_Blocks - I'm not sure if these terms are used in different contexts. There is some mention in kernel documentation but it refers to memory; external fragmentation of directories is mentioned in fs/jfs/jfs_dtree.c - for the ext4 allocator specifically, it's thousands of lines of code, not sure what it does (if anything) for any kind of directory fragmentation... – frostschutz Jan 02 '24 at 11:32
  • To sum it up: external fragmentation (as defined by both https://bbs.archlinux.org/viewtopic.php?id=85044 and https://ext4.wiki.kernel.org/index.php/Design_for_Large_Allocation_Blocks) is what we usually refer to simply as fragmentation, e.g. a file physically scattered all over a drive. Internal fragmentation doesn't imply performance penalty, rather a waste of space. And external fragmentation has nothing to do with the scattering of files of the same directory (so https://unix.stackexchange.com/questions/703876/difference-between-fragment-and-extent-in-ext4#answer-703879 is wrong). – ChennyStar Jan 03 '24 at 04:25
  • @Paul_Pedant some of your points are irrelevant. Yes, directory entries point to inodes...which point to data blocks. What about hard links? Ok, so sometimes the heuristic can't be used and moved files will break it; but the vast majority of inodes have a single hard link and don't get moved after creation, so that's not a big deal. Yes, write cache helps a lot. This algorithm is implemented in the block allocation code, which tries to group new blocks in the same cylinder group as neighboring files and certainly in the same group as other blocks in the same file. – user10489 Jan 03 '24 at 05:14

0 Answers0