tl;dr: For virtual, volatile, or inode-agnostic filesystems, inode numbers are usually generated from a monotonically incrementing, 32-bit counter when the inode is created. The rest of the inode (eg. permissions) is built from the equivalent data in the underlying filesystem, or is replaced with values set at mount time (eg. {uid,gid}=
) if no such concept exists.
To answer the question in the title (ie. abstractly, how Linux allocates inode numbers for a filesystem that has no inode concept), it depends on the filesystem. For some virtual or inodeless filesystems, the inode number is drawn at instantiation time from the get_next_ino
pool. This has a number of problems, though:
get_next_ino()
uses 32-bit inode numbers even on a 64-bit kernel, due to legacy handling for 32-bit userland without _FILE_OFFSET_BITS=64
;
get_next_ino()
is just a globally incrementing counter used by multiple filesystems, so the risk of overflow is increased even further.
Problems like this are one of the reasons why I moved tmpfs away from get_next_ino-backed inodes last year.
For this reason, tmpfs in particular is an exception from most volatile or "inodeless" filesystem formats. Sockets, pipes, ramfs, and the like still use the get_next_ino
pool as of 5.11.
As for your specific question about FAT filesystems: fs/fat/inode.c
is where inode numbers are allocated for FAT vilesystems. If we look in there, we see fat_build_inode
(source):
struct inode *fat_build_inode(struct super_block *sb,
struct msdos_dir_entry *de, loff_t i_pos)
{
struct inode *inode;
int err;
fat_lock_build_inode(MSDOS_SB(sb));
inode = fat_iget(sb, i_pos);
if (inode)
goto out;
inode = new_inode(sb);
if (!inode) {
inode = ERR_PTR(-ENOMEM);
goto out;
}
inode->i_ino = iunique(sb, MSDOS_ROOT_INO);
inode_set_iversion(inode, 1);
err = fat_fill_inode(inode, de);
if (err) {
iput(inode);
inode = ERR_PTR(err);
goto out;
}
fat_attach(inode, i_pos);
insert_inode_hash(inode);
out:
fat_unlock_build_inode(MSDOS_SB(sb));
return inode;
}
What this basically says is this:
- Take the FAT inode creation lock for this superblock.
- Check if the inode already exists at this position in the superblock. If so, unlock and return that inode.
- Otherwise, create a new inode.
- Get the inode number from
iunique(sb, MSDOS_ROOT_INO)
(more about that in a second).
- Fill the rest of the inode from the equivalent FAT datastructures.
inode->i_ino = iunique(sb, MSDOS_ROOT_INO);
is where the inode number is set here. iunique
(source) is a fs-agnostic function that provides unique inode numbers for a given superblock. It does this by using a superblock + inode-based hash table, with a monotonically increasing counter:
ino_t iunique(struct super_block *sb, ino_t max_reserved)
{
static DEFINE_SPINLOCK(iunique_lock);
static unsigned int counter;
ino_t res;
rcu_read_lock();
spin_lock(&iunique_lock);
do {
if (counter <= max_reserved)
counter = max_reserved + 1;
res = counter++;
} while (!test_inode_iunique(sb, res)); /* nb: this checks the hash table */
spin_unlock(&iunique_lock);
rcu_read_unlock();
return res;
}
In that respect, it's pretty similar to the previously mentioned get_next_ino
: just per-superblock instead of being global (like for pipes, sockets, or the like), and with some rudimentary hash-table based protection against collisions. It even inherits get_next_ino
's behaviour using 32-bit inode numbers as a method to try and avoid EOVERFLOW on legacy applications, so there are likely going to be more filesystems which need 64-bit inode fixes (like my aforementioned inode64
implementation for tmpfs) in the future.
So to summarise:
- Most virtual or inodeless filesystems use a monotonically incrementing counter for the inode number.
- That counter isn't stable even for on-disk inodeless filesystems*. It may change without other changes to the filesystem on remount.
- Most filesystems in this state (except for tmpfs with
inode64
) are still using 32-bit counters, so with heavy use it's entirely possible the counter may overflow and you may end up with duplicate inodes.
* ...although, to be fair, by contract this is true even for filesystems which do have an inode concept when i_generation
changes -- it just is less likely to happen in practice since often the inode number is related to its physical position, or similar.
i_generation
(ifioctl(FS_IOC_GETVERSION)
is supported for the filesystem). That's why we've generally just used incrementing counters. :-) – Chris Down Mar 27 '21 at 14:07vfat
can show different inode numbers for the same file, even without unmounting in the meantime, and that seems like something some program could trip on. (Also, it doesn't seem to supportFS_IOC_GETVERSION
, but I'm on an old kernel.) – ilkkachu Mar 27 '21 at 19:57ls
is called). Or doesnew_inode
orfat_fill_inode
save the inode? – goulashsoup Jun 13 '22 at 14:25