I'm trying to understand how inode numbers (as displayed by ls -i
) work with ext4 partitions.
I'm trying to understand whether they are a construct of the linux kernel and mapped to inodes on disk, or if they actually are the same numbers stored on disk.
Questions:
- Do inode numbers change when a computer is rebooted?
- When two partitions are mounted, can
ls -i
produce the same inode number for two different files as long as they are on different partitions. - Can inode numbers be recycled without rebooting or re-mounting partitions?
Why I'm asking...
I want to create a secondary index on a USB hard drive with 1.5TB of data and around 20 million files (filenames). Files range from 10s of bytes to 100s of GB. Many of them are hard linked multiple times, so a single file (blob on disk) might have anything up to 200 file names.
My task is to save space on disk by detecting duplicates and replacing the duplication with even more hard links.
Now as a single exercise, I think I can create a database of every file on disk, it's shasum, permissions etc... Once built, detecting duplication should be trivial. Bit I need to be certain I am using the right unique key. Filenames are inappropriate due to the large number of existing hard links. My hope is that I can use inode numbers.
What I would like to understand is whether or not the inode number us going to change when I next reboot my machine. Or if they are even more volatile (will they change while I'm building my database?)
All the documentation I read fudges the distinction between inode numbers as presented by the kernel and inodes on disk. Whether or not these are the same thing is unclear based on the articles I've already read.