3

Background

There are lots types of special files in Unix, for example symlinks, device files and proc files (under /proc). /proc files are just normal files or even text files. But for the rest, I only know how to use them, but I don't know their internal structure and what they are (in depth). And unfortunately, all the ways to access files can only get the object which they represent. In other words there is no way to get the internal representation.

Questions

For symlinks, there is no doubt that there is a string which store the paths of their targets. However, if there is only a string, how can they be distinguished from plaintext files? If there is a special header which is specified by filesystem drivers, what is it? Is there any convention? Can anyone tell me what the binary representation of a symlink pointing at /usr/bin/bash is?

For device files (nodes), what are their binary structure and representation? According to their behaviors, they must include information about relevant interface number and drivers. But this will make their size vary greatly. Can anyone explain this at a binary level?

For compatibility of symlinks and device file (I know it is impossible for a same device file to work in two different environment, but we can use our imagination to make an experiment) is the binary content of these files strongly dependent on file system type and the kernel of the operating system? For example, if I copy (not via cp command, just write identical binary content) to a different file system (like from ext4 to xfs), is this file (symlink or device file) still valid and functional? How about when copying it from a Linux machine to a BSD machine?

Or are they not files, and just special records in file system header part?

Stephen Kitt
  • 434,908
davmos
  • 525

1 Answers1

6

“Special” files are still files, stored in the file system like “regular” files. Directories, files, symlinks etc. are distinguished by their type, which is explicitly stored in,tje file system. See Understanding UNIX permissions and file types for details of the various file types you can encounter.

How files are stored, and what files can be stored, depends on the file system. Some file systems support a subset of Unix-style file types; for example, FAT can’t store anything other than files or directories (and volume labels). This means that a “special” file’s storage depends on the file system, and you can’t copy the bits representing a file from one file system to another as-is.

Symlinks store the text representation of their target. In most current Unix-style file systems, there’s room for short targets alongside the symlink’s “core” information (in its inode); longer links require the allocation of a data block.

Device nodes do store identifiers, known as the nodes’ major and minor. These are two small-ish numbers with fixed storage requirements, and there’s room for them in the inode. The numbers are OS-specific, so you can’t copy a node from Linux to BSD while preserving its function.

You can see exhaustive detail of Ext4’s way of storing this information in the kernel’s Ext4 documentation. Look for i_mode in particular to see how a file’s type is stored.

Stephen Kitt
  • 434,908