5

This image from TLDP is pretty awesome. It shows that before giving user space actual read, write, open access to the filesystem the blocks get mapped onto the virtual filesystem.

enter image description here

and Wikipedia says there are 3 versions of file systems on different layers.

So, are the standard (sd nodes) referring to the physical or, after the LVM mapped, virtual filesystem?

Or are they referring to just the partition?(which would mean that writing, directly on too the partition would skip the filesystem driver, without which you couldn't even interact with files itself)

If that is the case, what devices represent the filesystem drivers/ or filesystems or.... I just don't know.. could just anybody link me something where disk usage by the kernel is explained?

Junaga
  • 379
  • 2
  • 16

2 Answers2

8

tl;dr: /dev/sdaX represents a partition. I think a fundamental misconception you have is the difference between filesystems and partitions. A partition is really simple - basically it's just a section of the disk which is defined in a partition table at the beginning of the disk. A filesystem, however, is a much more advanced thing. A filesystem is essentially a data structure used to keep track of files that the kernel (specifically, a filesystem driver) is able to read and write. That data structure can technically be put anywhere on disk, but it is expected that the beginning of the fs data structure is the same as the beginning of a partition.

You mentioned LVM in your question - let's forget about that for the moment since that's a more advanced topic (I'll explain LVM at the end).

Say you have a single 100GB hard disk with nothing but zeros. In this case, you will have a /dev/sda file which you can 100GB from (although e.g. du will report it as zero-length because it's a block special) and contains nothing but zeros. /dev/sda is the method by which the kernel exposes the raw device contents to userspace for reading and writing. This is why it has the same amount of data as your disk and has the same contents as your disk. If you flip the fifth bit on /dev/sda to be one instead of zero, the kernel will flip the fifth bit on the physical drive to match. In the diagram you provided, this write would go through the system call interface into the kernel, then through the IDE hard disk driver, and finally to the hard disk itself.


Now let's say you want to do something useful with that drive, like store files on it. Now you need a filesystem. There are multiple a ridiculous amount of filesystems available to you in the Linux kernel. Each one of them uses a different data structure on disk to keep track of files, and they might also modify their data structures in different ways, for example to provide atomic write guarantees (i.e. writes either succeed or they don't; there can never be half-written data even if the machine crashes). This is what people mean when they talk about a "filesystem driver": a filesystem driver is a piece of code that understands how to read and write a particular filesystem's data structures on disk. Examples include ext4, btrfs, XFS, etc.

So you want to store files. Let's say you pick ext4 as a filesystem. What you need to do now is format the disk so that the data structures for an empty filesystem exist on disk. To do this, you use mkfs.ext4 and tell it to write to /dev/sda. mkfs.ext4 will then write an empty ext4 filesystem starting at the beginning of /dev/sda. The kernel will then take the writes to /dev/sda and apply them to the beginning of the physical disk. Now that the disk contains a filesystem's data structures, you can do e.g. mount /dev/sda /mnt to mount the brand-new filesystem, move files into it, etc. Any writes to files in /mnt would then go through the system call interface, then to the ext4 filesystem driver (which knows how to turn the more abstract "write this data to such-and-such a file" into the concrete changes that need to be made to the fs data structures on disk), then to the IDE hard disk driver, then finally to the drive itself.


Now, the above will work, but it's not normally how people do things. Usually they use partitions on the drive. A partition is basically just a particular section of the drive. When you use partitions, you have a partition table at the beginning of the drive that says where, physically, each partition is located. Partitions are neat because they allow you to divide up a drive into multiple sections that can be used for different purposes.

So let's say you want to create two filesystems on the drive, both ~50GB (i.e. half-and-half). First you'd have to partition the drive. In order to do this you'd use a tool like fdisk or gdisk, both of which create different types of partition tables, and you'd tell your tool to write to /dev/sda. When you were done partitioning, you'd have /dev/sda, /dev/sda1, and /dev/sda2. /dev/sda1 and /dev/sda2 are the kernel's way of representing the different partitions in the disk. If you write to the beginning of /dev/sda2, it will write to the beginning of the second partition, which is in the middle of the disk.

Another way to explain this is by talking about the contents of /dev/sda. Recall that /dev/sda is, bit-for-bit, the contents of the physical hard drive. And /dev/sda1 is, bit-for-bit, the contents of the first partition of the hard drive. This means that /dev/sda has a little bit of data - the partition header - followed by the exact contents of /dev/sda1, then /dev/sda2. /dev/sda1 and /dev/sda2 are mapped to specific regions on the disk, which are partitions that you've configured.

From here we can use mkfs.ext4 again to create a filesystem on /dev/sda1, which will write to the disk starting directly after the partition header. If we use mkfs.ext4 on /dev/sda2, it writes starting at the beginning of the partition, which is in the middle of the disk (and thus in the middle of /dev/sda's contents).

Now, you can do e.g. mount /dev/sda2 /mnt. This tells the kernel to read filesystem data starting at the beginning of the second partition and expose it to you in a more useful form - i.e. files and directories at the location /mnt. Again, the kernel uses a filesystem driver to actually perform this mapping.


Now let's talk about LVM, briefly. LVM is basically just an abstraction over partitions. Partitions map very, very directly to physical locations on disk. In the two-partition example above, let's say you wanted to delete the first partition and expand the second into the newly freed space. Because partitions are mapped directly to disk regions, the only way to do this is to physically move the entire 50GB of partition data to the beginning of the disk, then expand the partition to the end.

LVM is designed to make this less painful. Basically, you give LVM a bunch of raw storage, and then tell it how to use that storage. LVM provides you with a virtual "disk" that can be divided like partitions, but whose underlying storage can be anywhere in the raw storage pool you've allocated for it. To use the example above, if you gave LVM the entire disk to use, then divided it into two, you could delete the first "partition" and expand the second "partition" to fill that space instantly, because LVM is able to keep track of where data is on the disk without requiring it to be strictly "in order".

For loads more details on how LVM works, see this answer: https://unix.stackexchange.com/a/106871/29146

strugee
  • 14,951
  • can you add kernel modules to provide support for exotic file systems? – Junaga Nov 01 '16 at 22:27
  • 1
    wow, impressive effort! Minor nitpick: mkfs doesn't just write to the beginning of the partition. At least with ext* filesystems, it scribbles all over it (writing inode tables, backup superblocks, etc.) lazy_itable_init, etc. help a lot, but don't completely eliminate it. Also the first header isn't actually at the start, it's a few K in... – derobert Nov 01 '16 at 22:28
  • @derobert right, I meant that the data mkfs writes starts at the beginning of the partition. I'll edit. re: "first header", not sure what you mean - partition header? – strugee Nov 01 '16 at 22:49
  • 1
    @Junaga yep. a filesystem driver is just a piece of code in the kernel. therefore you can add new filesystem drivers for exotic filesystems by inserting new kernel modules. – strugee Nov 01 '16 at 22:54
  • 1
    @strugee by first header I meant first bit of ext[234] metadata. It's actually offset a bit from the start of the partition (to leave room for a boot sector or a few, don't remember the exact offset). – derobert Nov 02 '16 at 00:55
1

/dev/sda is an interface to the entire hard drive. If you have permission, you can directly seek anywhere in the drive. /dev/sda1 is the first partition on the drive. There's no file system involved yet at that point. Within the partition, there can be a file system directly, or it can be an LVM container.