How inodes numbers are assigned

Question

Two known facts:

In linux, moving a file from one location to another on the same file system doesn't change the inode (the file remains at "the same place", only the directories involved are changed)
Copying, however, generates a truly new file, with a new inode.

Armed with this information, I observer the following phenomena:

$ ls -li /tmp/*.db
1452722 -rw-r--r-- 1 omerda omerda 245760 Jul  7 12:33 /tmp/vf4.db
$
$ cp /tmp/vf4.db /tmp/vf4.2.db
$ ls -li /tmp/*.db # New inode introduced
1452719 -rw-r--r-- 1 omerda omerda 245760 Jul  7 12:38 /tmp/vf4.2.db
1452722 -rw-r--r-- 1 omerda omerda 245760 Jul  7 12:33 /tmp/vf4.db
$
$ mv /tmp/vf4.2.db /tmp/vf4.db
$ ls -li /tmp/*.db
1452719 -rw-r--r-- 1 omerda omerda 245760 Jul  7 12:38 /tmp/vf4.db
$
$ cp /tmp/vf4.db /tmp/vf4.2.db
$ ls -li /tmp/*.db # Original inode appears again! (1452722)
1452722 -rw-r--r-- 1 omerda omerda 245760 Jul  7 12:41 /tmp/vf4.2.db
1452719 -rw-r--r-- 1 omerda omerda 245760 Jul  7 12:41 /tmp/vf4.db
$
$ mv /tmp/vf4.2.db /tmp/vf4.db
$ ls -li /tmp/*.db
1452722 -rw-r--r-- 1 omerda omerda 245760 Jul  7 12:41 /tmp/vf4.db

This "round trip" always results in the original inode being attached to the original file again. I would have expected a fresh new inode being used in each copy.

How does it make this reuse of same inode?

EDIT

In the comments section some asked for a context. So the context is that this bad practice is used by some sqlite wrappers to replace db files without sqlite3 showing errors about the replacement. However, this is not a question about sqlite, please stick to the topic and question.

this is not answering your question but: why do you need to know? i think that inode management (including inode reuse) should typically stay opaque to the user. if two files have the same inode then they are samesame. if they have different inodes than they are distinct. anything else is an implementation detail. (but you are of course entitled to be curious about these details) — umläute, Jul 07 '21 at 09:58
@umläute I need to know because some applications rely on this behavior, and I wonder if this is a good assumption to make. For example, in some sqlite3 wrappers (which examines inode of opened db), some do the trick in the question to "fool" sqlite3. I can elaborate in separate comment if this is interesting enough.. — Omer Dagan, Jul 07 '21 at 11:18
I need to know because some applications rely on this behavior, and I wonder if this is a good assumption to make. No, it's not a good assumption. What happens if something else happens on the system and that inode number gets used somewhere else? The "logic" behind these types of assumptions is faulty. "I did X and observed Y, therefore X means Y will happen" is bad logic. "My alarm went off at 5:30 AM and the sun rose at 6 AM. If I set my alarm at 3 AM the sun will rise at 3:30 AM" is the exact same "reasoning". — Andrew Henle, Jul 07 '21 at 12:06
"For example, in some sqlite3 wrappers (which examines inode of opened db)" The key word is "opened". An opened inode will not have its inode number reused, no matter whether it's removed (i.e. unlinked from any directory). — , Jul 07 '21 at 12:10
@UncleBilly You're wrong on this. An opened db file inode can be reused, and unfortunately as I said, sometimes this is being relied on. There is a reason why I'm asking this question (it doesn't mean I support this bad practice tough, it is just the environment I work in) — Omer Dagan, Jul 07 '21 at 12:22
Bother to show an example? Your question doesn't contain such an example. By "opened" I mean an opened file handle, of course. There are programs which don't keep an opened handle to any kind of file when they "open" a document or database -- e.g. djvu viewers will open the file each time they render a page and close it immediately afterwards. — , Jul 07 '21 at 12:24
some do the trick in the question to "fool" sqlite3 Fool it into doing what? — Steve Summit, Jul 07 '21 at 18:47
Assuming that a new file gets the most-recently-freed inode number sounds like a reasonable assumption for some filesystems, and a completely unreasonable assumption for others. (And even for filesystems that do maintain a LIFO inode freelist, you've got an obvious race condition if you assume that a file you create will necessarily get the inode of the file you deleted a second ago.) — Steve Summit, Jul 07 '21 at 18:48
In case You are curious, assigning inodes inside a file system is quite complicated, for example in ext4 picking a inode decides what block group to use and this might be influenced by keeping data together. Ext4 used the “orlov allocator” algorithm for that. That was changed, when disks got larger and locality of data and block groups was less of a recovery requirement, see for example here: https://www.linuxsecrets.com/kdocs/ols/2008/ols2008v1-pages-263-274.pdf — eckes, Jul 07 '21 at 21:53
@UncleBilly Try running sqlite3 db, then perform those tricks so eventually there is a different file in the same path of the original, with the same inode (due to the "trick"), and you will see it works. I don't put an example, becuase it is off topic. I had a question about inodes. I never said it is a good practice, so I'm not sure what you're trying to prove here. It is a bad practice which is being used by some, and raises a question. — Omer Dagan, Jul 08 '21 at 09:08
@SteveSummit This is used to replace db files while working on an opened db. As I said to others, it is a bad practice, it is not used by me. Please just answer the question (about inodes, ignore the sqlite thing (it was edited into the question, not by me)) — Omer Dagan, Jul 08 '21 at 09:10
@Omer put an example somewhere where it's on-topic, and then link to it from here. It's not at all clear what you're talking about. Being able to replace an inode (the inode, not its content!) while it's in use by another process would be something very interesting to know about, but you haven't demonstrated anything like it yet. — , Jul 08 '21 at 09:38
@OmerDagan You've already accepted an answer, so I'm not sure what information you're still looking for, but I have posted an additional answer containing mine. — Steve Summit, Jul 08 '21 at 12:21

score 5 · Answer 1 · answered Jul 07 '21 at 10:26

The system reuses the same inode because the filesystem layer chooses to do so. As was mentioned in a comment, it's an implementation detail. In my case this is ext4, but there's no reason why a different filesystem type shouldn't use (or reuse) inodes differently. You could potentially find a filesystem that didn't have inodes as such, and the inode numbers were synthesised dynamically upon request. The tmpfs filesystem does not reuse inode numbers in this same way.

# Create two files on ext4
touch file
cp file copy
ls -li file copy
133235 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:13 copy
129071 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:13 file
Remove one, copy the other back
rm file
cp copy file
ls -li file copy
133235 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:13 copy
129071 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:13 file
Remove one, create an unexpected intervention, copy the other back
rm file
touch thing
cp copy file
ls -li file copy thing
133235 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:13 copy
133237 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:14 file
129071 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:14 thing

Now let's repeat on a tmpfs filesystem, for example /dev/shm

# Create two files on tmpfs
touch file
cp file copy
ls -li file copy
369355 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:27 copy
369354 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:27 file
Remove one, copy the other back
rm file
cp copy file
ls -li file copy
369355 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:27 copy
368123 -rw-r--r-- 1 roaima roaima 0 Jul  7 11:28 file

Potentially useful references

tmpfs doesn't even have inodes, it fakes them for the historical APIs that require them. — Jörg W Mittag, Jul 07 '21 at 18:55

score 3 · Accepted Answer · answered Jul 07 '21 at 13:19

How does it make this reuse of same inode?

In ext4, the inode numbers are just indexes to a table that contains the actual inode data. The lore tells that's what the "i" means, "index". It's not actually stored as a single consecutive table, but that doesn't matter.

The particular inode number you get is one that happens to be free at the time, and it makes sense for the filesystem code to implement the choice deterministically, so if it chose 1452722 before 1452719 the first time, it makes sense for it to pick 1452722 again if it's now free, and if no other changes were done in between than making the copy. It can't reserve the inode number permanently for a particular incarnation of a file, since that would quickly lead to a filesystem full of unusable inode entries reserved for deleted files.

You can't count on getting a particular inode number, mostly because some other process on the system might be creating files at the same time, reserving the inode number you expected to get before you recreate the file. Or the other process could remove files, causing the filesystem code to give one of those next. The way the filesystem is organized into block groups with their associated inodes, might also mean that simply growing a file could change where the filesystem goes to look for a free inode. Or it might not. All the inode number tells you is to identify the file as it currently exists, and when creating a new file, you just get one that doesn't correspond to any existing file.

And also, on other types of filesystems, like VFAT, there are no static inode numbers but instead you might just get a running counter.

Steve Summit · Answer 3 · 2021-07-08T12:25:10.093

This "round trip" always results in the original inode being attached to the original file again.

When you say "always", what you mean is, "Each time I looked". But this is absolutely not a guaranteed result!

I would have expected a fresh new inode being used in each copy.

That's certainly another possible result. But since there is not an infinite number of available inode numbers, at some point you're going to have to get a recycled (not brand-new or "fresh") one.

How does it make this reuse of same inode?

Any conventional filesystem has to manage the allocation of blocks. When a file is allocated or grows, fresh blocks must be allocated to store the file's data. When a file is deleted, its blocks can be made available for reuse. Different filesystems use different techniques to manage the set of "free" blocks, and to allocate them to files. The algortihm(s) chosen can obviously have a big effect on performance, and on fragmentation. Sometimes, the allocation may seem to be LIFO -- that is, the most recently freed block may be the next one to be allocated. But that's obviously not always the case.

And the situation is almost exactly the same for inodes. When a file is allocated, a fresh inode and/or inode number must be allocated to identify the file. When a file is deleted, its inode and inode number can be made available for reuse. Different filesystems use different techniques to manage the set of "free" inodes, and to allocate them to files. Sometimes, the allocation may seem to be LIFO, but that's obviously not always the case.

How inodes numbers are assigned

3 Answers3

Remove one, copy the other back

Remove one, create an unexpected intervention, copy the other back

Remove one, copy the other back