8

I searched but couldn't find anything - I am looking for a breakdown of the file structure of a symlink in bytes, in a ext filesystem.

I have tried creating a symlink file and then using hexdump on the symlink, but it complains that it's a directory (the link was to a folder) so it's obviously trying to dump the file/folder the link points to rather than the link itself.

AdminBee
  • 22,803

3 Answers3

19

You didn't provide additional details, so this explanation is for the moment centered on the EXT file systems common in Linux.

If you look at the "size" of a symlink as provided by e.g. ls -l, you will notice that the size is just as large as the name of the target it is pointing to is long. So, you can infer that the "actual" file contains just the path to the link target as text, and the interpretation as a symbolic link is stored in the filetype metadata (in particular, the flag S_IFLINK in the i_mode field of the inode the link file is attached to, where also the permission bits are stored; see this kernel documentation reference).

In order to improve performance and reduce device IO, if the symlink is shorter than 60 bytes it will be stored in the i_block field in the inode itself (see here). Since this makes a separate block access unnecessary, these links are called "fast symlinks" as opposed to symlinks pointing to longer paths, which fall back to the "traditional" method of storing the link target as text in an external data block.

AdminBee
  • 22,803
  • Thanks for this very good answer. I am having the same question too. If "the symlink file" stores the "path to the target", which I also believe so, why don't allow cat --do_not_follow to display just the "path to the target" in plain text? – midnite Dec 14 '21 at 18:24
  • 1
    @midnite That is a question that should probably be addressed to the tool maintainers ;), but we can speculate. (1) Not all filesystems (even among those that support symlinks) store the link target in this way, and cat will usually try to work as universally as possible. (2) Even on EXT filesystems, the link target is not stored in the file content if the path length is short, so it would be an option with very limited scope. – AdminBee Dec 15 '21 at 13:54
6

That's totally file system dependent.

Usually the the symlink target is stored as-is within the extra space of an inode block, the same way small directories and small files are. There's no need for any special data format -- the file mode bits already determine that it's a symlink and should be treated as such. The target is the "actual content": you can use readlink -n /path/to | hexdump if you really want to use hexdump.

When calling lstat(2) on a symlink, st.st_size will contain the length of the target (not including any terminating NUL byte).

  • 1
    As far as I understand, ext4 only stores symlinks embedded in the inode, not regular small files. And ext4 is rather common. – ilkkachu Sep 16 '20 at 10:36
  • @ilkkachu where the filesystem actually stores the content of a symlink (i.e. in the inode, in some metadata, in some (b/rb/whatever)tree or in actual data extents) is irrelevant for applications. Other filesystems can store regular files in other places than data extents. And some filesystems don't even have inodes. Regular applications aren't concerned with how/where the payload of files is stored and treat the VFS as a black box. They just tell the kernel/glib "give me the content" or "stat the metadata associated with the path" or something like that. – blubberdiblub Apr 04 '23 at 21:54
  • @ilkkachu well, after rereading the question of OP, I may have misinterpreted it and went on that wrong assumption and wrong context, so in that light I guess not. My apologies – blubberdiblub Apr 05 '23 at 08:47
  • 1
    @blubberdiblub, my comment there was just a side note on how the answer here implies that both symlinks and small files might be saved inside the inode. I know it's not really relevant to the question about reading the symlink from an application -- which why it was just a comment. – ilkkachu Apr 05 '23 at 12:09
  • The funny part of dumping is to see that it inverts the order of the pairs of bytes. I suppose is because of the "little endian style" of storing data. For example banana becomes abanan, you must flip the pairs – Sergio Abreu Oct 09 '23 at 12:01
5

Using hexdump on the symlink is not looking at the ext4 filesystem at all. It's looking at the application-facing abstraction. At this layer there is nothing to see. Opening the symlink will either attempt to open what it refers to according to path resolution semantics (no O_NOFOLLOW) or fail (O_NOFOLLOW). The way to read the contents at this abstraction layer is readlink, and it will simply give you the sequence of bytes that were passed to symlink to create it. This does not tell you anything about how it's represented on disk.

To examine the representation on disk, you need to open (or use a tool that opens) the block device, and walk the filesystems structures til you get to the symlink you want to see. I believe ext4 stores small-content symlinks inline in the inode structure (no separate data block on disk) and stores large ones merely as a file containing the link contents, but with type symlink.