Doesn't allowing a user mode program to access kernel space memory and execute the IN and OUT instructions defeat the purpose of having CPU modes?

Question

When the CPU is in user mode, the CPU can't execute privileged instructions and can't access kernel space memory.

And when the CPU is in kernel mode, the CPU can execute all instructions and can access all memory.

Now in Linux, a user mode program can access all memory (using /dev/mem) and can execute the two privileged instructions IN and OUT (using iopl() I think).

So a user mode program in Linux can do most things (I think most things) that can be done in kernel mode.

Doesn't allowing a user mode program to have all this power defeats the purpose of having CPU modes?

score 23 · Answer 1 · edited Mar 11 '19 at 12:21

So a user mode program in Linux can do most things (I think most things) that can be done in kernel mode.

Well, not all user mode programs can, only those with the appropriate privileges. And that's determined by the kernel.

/dev/mem is protected by the usual filesystem access permissions, and the CAP_SYS_RAWIO capability. iopl() and ioperm() are also restricted through the same capability.

/dev/mem can also be compiled out of the kernel altogether (CONFIG_DEVMEM).

Doesn't allowing a user mode program to have all this power defeats the purpose of having CPU modes?

Well, maybe. It depends on what you want privileged user-space processes to be able to do. User-space processes can also trash the whole hard drive if they have access to /dev/sda (or equivalent), even though that defeats the purpose of having a filesystem driver to handle storage access.

(Then there's also the fact that iopl() works by utilizing the CPU privilege modes on i386, so it can't well be said to defeat their purpose.)

Even iopl doesn't allow all privileged instructions, so it's still useful for making sure a buggy user-space program doesn't accidentally run invd by jumping through a corrupted function pointer that points at executable memory starting with 0F 08 bytes. I added an answer with some of the non-security reasons why it's useful to have user-space processes elevate their privileges. — Peter Cordes, Mar 11 '19 at 03:39

score 16 · Answer 2 · answered Mar 11 '19 at 03:10

Only in the same way that modprobe "defeats" security by loading new code into the kernel.

For various reasons, sometimes it makes more sense to have semi-privileged code (like graphics drivers inside the X server) running in user-space rather than a kernel thread.

Being able to kill it more easily, unless it locks up the HW.
Having it demand-page its code / data from files in the filesystem. (Kernel memory is not pageable)
Giving it its own virtual address space where bugs in the X server might just crash the X server, without taking down the kernel.

It doesn't do much for security, but there are big reliability and software architecture advantages.

Baking graphics drivers into the kernel might reduce context switches between X clients and X server, like just one user->kernel->user instead of having to get data into another use-space process, but X servers historically are too big and too buggy to want them fully in kernel.

Yes, malicious code with these privs could take over the kernel if it wanted to, using /dev/mem to modify kernel code.

Or on x86 for example, run a cli instruction to disable interrupts on that core after making an iopl system call to set its IO privilege level to ring 0.

But even x86 iopl "only" gives access to some instructions: in/out (and the string versions ins/outs), and cli/sti. It doesn't let you use rdmsr or wrmsr to read or write "model specific registers" (e.g. IA32_LSTAR which sets the kernel entry point address for the x86-64 syscall instruction), or use lidt to replace the interrupt-descriptor table (which would let you totally take over the machine from the existing kernel, at least on that core.)

You can't even read control registers (like CR3 which holds the physical address of the top-level page-directory, which an attacking process might find useful as an offset into /dev/mem to modify its own page tables as an alternative to mmaping more of /dev/mem.)

invd (invalidate all caches without write-back!! (use case = early BIOS before RAM is configured)) is another fun one that always requires full CPL 0 (current privilege level), not just IOPL. Even wbinvd is privileged because it's so slow (and not interruptible), and has to flush all caches across all cores. (See Is there a way to flush the entire CPU cache related to a program? and WBINVD instruction usage)

Bugs that result in a jump to a bad address running data as code thus can't execute any of these instructions by accident in a user-space X server.

The current privilege level (in protected and long mode) is the low 2 bits of cs (the code segment selector). mov eax, cs / and eax, 3 works in any mode to read the privilege level.

To write the privilege level, you do a jmp far or call far to set CS:RIP (but the GDT/LDT entry for the target segment can restrict it based on the old privilege level, which is why user-space can't do this to elevate itself). Or you use int or syscall to switch to ring 0 at a kernel entry point.

Actually, I'm pretty certain it's just code "selector" in Intel parlace. It was a segment in the 8086/8088, possibly in the 80186, but by the 80286, it was referred to as a selector, and I don't think they've officially changed that terminology since. — user, Mar 11 '19 at 20:07

Doesn't allowing a user mode program to access kernel space memory and execute the IN and OUT instructions defeat the purpose of having CPU modes?

2 Answers2