Why are `copy_from_user()` and `copy_to_user()` needed, when the kernel is mapped into the same virtual address space as the process itself?

Question

Why are copy_from_user() and copy_to_user() needed, when the kernel is mapped into the same virtual address space as the process itself?

Having developed a few (toy) kernel modules for learning purposes, I quickly reliazed that copy_from_user() and copy_to_user() were needed to copy data from/to user-space buffers; otherwise errors related to invalid addresses resulted in crashes.

But if 0x1fffff is a virtual address pointing to a user-space buffer, then why isn't that address valid in the kernel? The kernel is in the same virtual address space, so 0x1fffff would be mapped to the same physical memory.

score 7 · Accepted Answer · edited Oct 27 '21 at 09:45

7

The address space mapping is the same on some (not all!) architectures, but even on architectures where they are the same, the protection levels aren’t. copy_from_user etc. serve three main purposes:

they check that the permissions on the memory to be read from or written to would allow the process running in user space to read from or write to it — this ensures that processes can’t trick the kernel into accessing memory the process shouldn’t be able to;
they allow for specific error-handling so that protection faults don’t crash the kernel, for example if the requested addresses aren’t currently mapped (think of zero pages or swapped-out pages);
they ensure that the kernel doesn’t trip over its own protection, e.g. SMAP or kernel-specific address spaces (S/390).

Some architectures use memory layouts which allow these functions to take shortcuts, e.g. using a direct mapping of physical memory, but you can’t assume that to be the case, and it doesn’t handle all situations anyway (swapped-out pages aren’t present in physical memory).

edited Oct 27 '21 at 09:45

Shuzheng

4,411

answered Oct 27 '21 at 09:36

Stephen Kitt

434,908

Thank you. I have some comments. On x86 and arm the address space mapping is the same for the kernel and process? The functions first disable SMAP before checking page permissions and writing/reading data? For architectures with direct mapping of physical memory, pages cannot be swapped out? – Shuzheng Oct 27 '21 at 09:52
2

Even on x86, the address space mapping isn’t identical if KPTI is enabled (but that doesn’t affect user-space mappings). However AFAIK, user-space mappings are identical in user mode and kernel mode on x86 and ARM. SMAP is disabled only around the actual copies (look for STAC/CLAC in the kernel code). User pages can be swapped out even with direct mapping. – Stephen Kitt Oct 27 '21 at 10:25
I see. I guess KPTI isn't enabled, unless one wants to protect against Meltdown, which few of us want in practice. Isn't SMAP disabled both for read and write to user-space pages; and their metadata (permissions/protections) can be read without disabling SMAP? Why would one want to swap out user pages with a direct mapping? In that case, there can't be more virtual memory than physical memory – Shuzheng Oct 27 '21 at 10:31
Most distribution kernels have KPTI enabled. Permission checks can be done without disabling SMAP, which is why I said that it’s disabled only around the actual copies (reading from or writing to user memory). Why can’t there be more virtual memory than physical memory with a direct mapping? – Stephen Kitt Oct 27 '21 at 10:41
I think the address errors I got while developing kernel modules were related to either KPTI or SMAP (it wasn't disabled before copying) then; thanks for pointing this out! I used Debian buster. With a direct mapping, virtual memory addresses higher than physical memory addresses would map to nothing (at least that's what I understand from the concept of a direct mapping: X maps to X) – Shuzheng Oct 27 '21 at 11:05
1

Ah right, the direct mapping in the kernel works the other way round: there’s a portion of the virtual address space which is dedicated to mapping physical memory, so physical address X maps to virtual address X+page_offset_base (see the kernel memory map for x86). The virtual address space isn’t constrained by physical memory, but physical memory is constrained by the amount of room which can be given to it in the address space (64TiB on x86). – Stephen Kitt Oct 27 '21 at 12:12
But if there is a fixed mapping: physical address X maps to virtual address X+page_offset_base, how can virtual addresses greater than maximum physical address+page_offset_base ever come into play? They are swapped out constantly? – Shuzheng Oct 27 '21 at 12:59
1

Virtual addresses can map to physical memory, and they can map to other things (including page not present for swap or uninitialised memory). A single physical page can also be mapped multiple times. Basically, the mapping isn’t bijective. All the pages that are mapped into a given process’ address space are also mapped in the kernel direct mapping. – Stephen Kitt Oct 27 '21 at 14:22
Excellent. By page not present for swap, you mean a page that has been swapped out to hard drive storage? A page is a concept of virtual memory AFAIK, e.g. 4K bytes, so are you sure a virtual address can map to a page (virtual address)? – Shuzheng Oct 27 '21 at 14:38
1

A virtual address maps to a page table entry, and that maps to wherever the memory is — a physical page, a page on disk (in swap or a file), a zero page, etc. Pages can have varying sizes (see hugepages). – Stephen Kitt Oct 27 '21 at 14:57

Why are `copy_from_user()` and `copy_to_user()` needed, when the kernel is mapped into the same virtual address space as the process itself?

1 Answers1

Linked