How do inode numbers from ls -i relate to inodes on disk

Question

I'm trying to understand how inode numbers (as displayed by ls -i) work with ext4 partitions.

I'm trying to understand whether they are a construct of the linux kernel and mapped to inodes on disk, or if they actually are the same numbers stored on disk.

Questions:

Do inode numbers change when a computer is rebooted?
When two partitions are mounted, can ls -i produce the same inode number for two different files as long as they are on different partitions.
Can inode numbers be recycled without rebooting or re-mounting partitions?

Why I'm asking...

I want to create a secondary index on a USB hard drive with 1.5TB of data and around 20 million files (filenames). Files range from 10s of bytes to 100s of GB. Many of them are hard linked multiple times, so a single file (blob on disk) might have anything up to 200 file names.

My task is to save space on disk by detecting duplicates and replacing the duplication with even more hard links.

Now as a single exercise, I think I can create a database of every file on disk, it's shasum, permissions etc... Once built, detecting duplication should be trivial. Bit I need to be certain I am using the right unique key. Filenames are inappropriate due to the large number of existing hard links. My hope is that I can use inode numbers.

What I would like to understand is whether or not the inode number us going to change when I next reboot my machine. Or if they are even more volatile (will they change while I'm building my database?)

All the documentation I read fudges the distinction between inode numbers as presented by the kernel and inodes on disk. Whether or not these are the same thing is unclear based on the articles I've already read.

Sergiy Kolodyazhnyy · Accepted Answer · 2019-02-04T01:41:01.560

I'm trying to understand how inode numbers (as displayed by ls -i) work with ext4 partitions.

Essentially, inode is a reference for a filesystem(!), a bridge between actual data on disk (the bits and bytes) and name associated with that data (/etc/passwd for instance). Filenames are organized into directories, where directory entry is filename with corresponding inode.

Inode then contains the actual information - permissions, which blocks are occupied on disk, owner, group, etc. In How are directory structures stored in UNIX filesystem, there is a very nice diagram, that explains relation between files and inodes a bit better:

And when you have a file in another directory pointing to the same inode number, you have what is known as hard link.

Now, notice I've emphasized that inode is reference specific to filesystem, and here's the reason to be mindful of that:

The inode number of any given file is unique to the filesystem, but not necessarily unique to all filesystems mounted on a given host. When you have multiple filesystems, you will see duplicate inode numbers between filesystems, this is normal.

This is in contrast to devices. You may have multiple filesystems on the same device, such as /var filesystem and /, and yet they're on the same drive.

Now, can inode number change? Sort of. Filesystem is responsible for managing inodes, so unless there's underlying issues with filesystem, inode number shouldn't change. In certain tricky cases, such as vim text editor,

renames the old file, then writes a new file with the original name, if it thinks it can re-create the original file's attributes. If you want to reuse the existing inode (and so risk losing data, or waste more time making a backup copy), add set backupcopy yes to your .vimrc.

The key point to remember is that where data might be the same to the user, under the hood it actually is written to new location on disk, hence the change in inode number.

So, to make things short:

Do inode numbers change when a computer is rebooted?

Not unless there's something wrong with filesystem after reboot

2.When two partitions are mounted, can ls -i produce the same inode number for two different files as long as they are on different partitions.

Yes, since two different partitions will have different filesystems. I don't know a lot about LVM, but under that type of storage management two physical volumes could be combined into single logical volume, which would in my theoretical guess be the case where ls - would produce one inode per file

Can inode numbers be recycled without rebooting or re-mounting partitions?

The filesystem does that when a file is removed( that is , when all links to file are removed, and there's nothing pointing to that inode).

My task is to save space on disk by detecting duplicates and replacing the duplication with even more hard links.

Well, detecting duplication can be done via md5sum or other checksum command. In such case you're examining the actual data, which may or may not live under different inodes on disk. One example is from heemayls answer:

find . ! -empty -type f -exec md5sum {} + | sort | uniq -w32 -dD

This reference: https://manpages.debian.org/hardlink/hardlink.1 could be useful. — A.B, Feb 04 '19 at 06:56

Stephen Kitt · Answer 2 · 2019-02-03T22:12:18.650

No, inode numbers do not change when a computer is rebooted, at least not with POSIX file systems (such as ext4) where the inode is stored on disk.
Yes, two different files on different partitions can have the same inode number. See Can two files on two separate filesystems share the same inode number? and Why do the directories /home, /usr, /var, etc. all have the same inode number (2)? for details. (What is unique, within a given system, is the device number–inode pair.)
Yes, if you delete a file, its inode can be re-used without rebooting or re-mounting.

How do inode numbers from ls -i relate to inodes on disk

2 Answers2