40

I ran following commands in the order specified:

$ln a b
$ls -i a b
523669 a 523669 b
$rm -f a
$ls -i b
523669 b

I concluded from this test that the command rm actually removes only the filename (a in this test) instead of the file, as the inode still exists and could be retrieved through another filename (b).

My question is, if a file is hard linked to only one filename, when rm is executed to the file, is the real file (i.e., the inode) removed completely? And if not, can a file inode be retrieved without a filename and only through the inode?

depquid
  • 3,891
user43312
  • 961
  • 1
    Sounds OS-specific to me. – Ignacio Vazquez-Abrams Sep 29 '13 at 07:17
  • @Ignacio Vazquez-Abrams. You mean it depends on version? – user43312 Sep 29 '13 at 07:19
  • No, I mean it depends on the operating system. Each have different (if any) ways of tapping into the VFS. – Ignacio Vazquez-Abrams Sep 29 '13 at 07:21
  • @Ignacio Vazquez-Abrams Do you have any idea about RHL or RHEL? – user43312 Sep 29 '13 at 07:23
  • @IgnacioVazquez-Abrams - do you know of any OS that allows file retrieval by anything other than a file name? Most filesystems have something like an inode (ODS-2's "file header" for example), but I don't know that anything allows retrieval by inode-analog-number. –  Sep 30 '13 at 03:03
  • @BruceEdiger: I have no knowledge of one that allows it through the filesystem itself, but I imagine that it could be possible in various OSes to tap into the VFS at a lower level to do so, possibly through a system call. – Ignacio Vazquez-Abrams Sep 30 '13 at 03:20
  • 2
    @BruceEdiger Os X sort-of does that. You can access a filesystem object using a "file reference URL" which is, fundamentally, built from the file system number and the node number. However it's not officially supported to build those yourself. Instead you obtain a "file reference URL" for a file and then use it instead of the pathname for subsequent accesses in the same runtime session so that your application becomes oblivious to the file being moved elsewhere on the same volume. – Analog File Nov 24 '15 at 17:01

5 Answers5

47

If you try to open a file via its inode, this bypasses any directory traversal. The directory traversal is necessary to determine the permissions of the file and directories leading to it. Without a directory traversal, the kernel has no way to determine whether the calling process is allowed to access the file.

There was a proposed patch to the Linux kernel to allow creating a link to a file from a file descriptor. It was rejected because implementing this securely would have been extremely hard.

Under Linux (and probably on other unix variants for the same reason), you cannot create a link to a deleted file, so if a file no longer has a name, you can't re-add one.¹ You can open a deleted file by opening the magic links under /proc/$pid/fd/.

If a file no longer has any link and is no longer open, it no longer exists and the space formerly used by its data may be reclaimed at any time.

¹ You may be able to do this by twiddling the bytes directly in the filesystem in a filesystem-dependent way, for example with debugfs for ext2/ext3/ext4. This requires access to the device on which the filesystem is mounted (i.e. typically only root can attempt it). However, while debugfs can access a file by inode, this doesn't help if the file is deleted: the file will be truly deleted if the application closes it, and running debugfs in read-write mode on a mounted filesystem is a recipe for disaster.

19

The 'ln' and 'rm' commands have worked exactly like this in every UNIX filesystem since the early 1970s. Mac OSX, BSD and Linux all inherit this original design.

By itself, a UNIX file has no name, only an inode number or inum. But you can only access it through an entry in a special "directory" file that associates a name with the inum in question; you can't specify the inum directly.

A directory is itself a file, so you must also access it through (another) directory and so on, through a series of directory names delimited by forward slashes (/) known as a "path name". A path starts in the "current working directory" of the process unless the name begins with a "/", in which case it starts with the file system root directory. E.g., if the path name contains no "/" characters, then it is expected to be an entry in the current directory.

A non-directory file can have any number of path names, known as "hard links", and it will continue to exist until all of its path names have been removed and the last process has closed the file. Then the file is actually deleted and its space marked as available for reuse. That is, you can creat() or open() a singly-linked file and then unlink() it so it no longer appears in the file system name space, but the file will continue to exist until you close it. This is useful for temporary scratch files that won't be read by any other program.

Although directories have inode numbers, most file systems disallow hard links to them; they can appear in only one other directory. (One unusual exception is the Mac OSX HFS+ file system; this lets Time Machine backups work.) You can still create "soft links" to directories (or any other file). A soft link resembles a directory entry except that it contains another path name rather than an inum.

Every UNIX file has an owner, group and access permissions. It is necessary but not sufficient that they let you open the file; you must also have at least execute permission for every directory in the pathname you use to refer to it. That's why there's no standard way to open a UNIX file by its inode number; that would bypass an important, widely used security mechanism.

But this doesn't explain why there can't be a standard way for a root (privileged) user to open a file by inode number, since permissions checking is bypassed anyway. This would be very useful for certain system management functions such as backups. To my knowledge, such mechanisms do exist, but they're all filesystem-specific; there is no general way to do it for any UNIX filesystem.

Phil Karn
  • 299
  • 2
  • 2
  • 3
    The forward in / is silent, so it is pronounced “slash”. – ctrl-alt-delor Jul 21 '16 at 23:32
  • By far, my preferred answer that left me with some general map of the matter. But could you be a little more generous and expand just a few more lines at the ending "there is no general way to do it" ?? like you said that HFS+ supports hard links to directory files - could you say which file-system (that you know) allow opening files by inode, and some link on how it's done? – user176181 Oct 04 '21 at 07:18
15

On Linux, debugfs, the interactive ext2/ext3/ext4 file system debugger provides a ln command which can take an inode number as filespec and create a new hard link to the corresponding file. In practice though, this requires that the unlinked file is kept open by a process, maintaining an open file descriptor in /proc/[pid]/fd/[n]. Attempting this is on a deleted file will most likely lead to file system corruption.

This is because in order to ensure that ext3 (and in extension ext4) can safely resume an unlink after a crash, it actually zeros out the block pointers in the inode, whereas ext2 just marks these blocks as unused in the block bitmaps and marks the inode as "deleted" and leaves the block pointers alone. Even so, as the file system needs to be mounted read-write in order to create the hard link, the blocks reserved for the deleted file might already have been reallocated.

Prior to kernel version 2.6.39 it used to be that the ln -L|--logical option introduced in GNU coreutils v8.0 could be used to recover an unlinked file via an open file descriptor in /proc/[pid]/fd/[n] if both the unlinked file and new hardlink resided on a tmpfs file system . This capability has since been disabled, due to, as Gilles pointed out, the security considerations involved in allowing hard link creation directly from a file descriptor.

Thomas Nyman
  • 30,502
  • I just tried using ln -L to recover a deleted file from /proc and got the error: "No such file or directory", so I don't think it actually supports this. I have coreutils 8.21. – wingedsubmariner Sep 29 '13 at 12:58
  • 1
    ln -L doesn't do what you say it does. It tells ln that if the source is a symbolic link, it should hard link the target. The symbolic links in /proc/$pid/fd are special, and hard-linking a (deleted) link doesn't work. – Gilles 'SO- stop being evil' Sep 30 '13 at 00:54
  • 1
    Also debugfs won't help if the file has been deleted — unless you want to risk running it in read-write mode on a mounted filesystem, which is likely to completely mangle the whole filesystem. – Gilles 'SO- stop being evil' Sep 30 '13 at 01:05
  • Updated the answer in regards to ln -L. It used to be possible to create hard links from /proc/[pid]/fd/[n] using it in certain special circumstances, but this has since been fixed. – Thomas Nyman Sep 30 '13 at 06:15
  • 1
    debugfs's ln is really low level and only creates a name, doesn't update the count nor unmarks the blocks as unused so it's highly dangerous. Prefer debugfs'sundel which des all of that. Warning: debugfs is not to be run on a mounted filesystem unless you want to take a chance at burning your FS to ashes. – Lloeki Aug 07 '17 at 07:38
5

The question can be taken theoretically (which can be achieved with debugfs) or pragmatically (emergency situation). In the latter case, I assume the intent is saving the day and restoring the file's content, possibly urgently (which is how I landed on this question, so I think it is still relevant and useful).

Since there's no kernel API, debugfs should not be run on a live file system because it manipulates the FS structure directly. Therefore to do it live, you have to get hold of another filename. Assuming the file is still open by some process (any process), one can reach for the ever convenient file descriptors in /proc:

$ lsof -F pf "$PWD/a" | sed 's/^p//' # find pid and file descriptor number of any process having the file open
$ pid=1234
$ ls -l /proc/$pid/fd/* | grep "$PWD/a" # find file descriptor number
$ fd=42
$ cat /proc/$pid/fd/$fd > "$PWD/a.restored" # read contents to a new filename

Tips:

  • if you have a doubt about the right fd, you can run commands such as file on it
  • if there's a process writing to the file, be sure to stop that process ASAP or you won't get the latest data. An (untested) trick might be to open the file read only through the fd with some other process (try tail -f < /proc/$pid/fd/$fd > /dev/null, exit the writing process so that it exits cleanly, and use the new process's fd.
Lloeki
  • 225
  • 2
    That should be tail -f < /proc/... in the second tip. – Murray Jensen Jan 16 '19 at 23:27
  • 1
    Or use tail -c +0 -f to copy it in the first place instead of cat, if the writing process is only appending (not seeking back and rewriting). Exit the other process before tail, then wait for tail to get to the end of the file. – Peter Cordes Mar 13 '19 at 02:43
2

debugfs has a cat command that takes an inode number and prints the data where the inode's addresses point to.

For example:

sudo debugfs -D -R 'cat <8>' /dev/sda3 > ext4_journal

writes the contents of the ext4 journal (which commonly has inode number 8 but no file name) to a file.

The angle brackets around the inode number are not a placeholder, cat expects them to be there.

The -D causes debugfs to bypass the buffer cache. Without this option, you might read stale data.

  • 1
    Nice! This only works if the file is held open somewhere (or is special like ext4_journal). Running the command for a foo.txt file inode works after rm iff the file is open elsewhere (eg in a program) – ljden Mar 18 '21 at 01:25