5

Following a fork() call in Linux, two processes (one being a child of the other) will share allocated heap memory. These allocated pages are marked COW (copy-on-write) and will remain shared until either process modifies them. At this point, they are copied, but the virtual address pointers referencing them remain the same. How can the MMU (memory management unit) distinguish between the two? Consider the following:

  1. Process A is started
  2. Process A is allocated a memory page, pointed to by the virtual address 0x1234
  3. Process A fork()s, spawning process B
  4. Process A and B now share virtual address 0x1234, pointing to the same physical memory location
  5. Process B modifies its 0x1234 memory page
  6. This memory page is copied and then modified
  7. Process A and B both have virtual address 0x1234, but this points to different physical memory addresses

How can this be distinguished?

notlesh
  • 905
  • Ok, luckily I catched you here. I wanted to repair the mistake I just did on Meta.SE; I didn't catch the fact that you were asking for where to ask your question, which is perfectly on-topic on Meta.SE; I thought you were just plainly asking the question over there. So that was completely my bad, and if you want to undelete the question there is going to be perfectly on-topic, I've just been silly. Apologizes for that. – kos Oct 06 '15 at 03:36
  • 1
    @kos lol, I was a bit confused, too. I'll leave this up and see how it does, worst case I'll delete it. – notlesh Oct 06 '15 at 03:39
  • Sorry again. It seems on-topic here though – kos Oct 06 '15 at 03:42
  • 1
    Related: http://stackoverflow.com/questions/18431261/how-does-x86-paging-work – Mark Plotnick Oct 06 '15 at 04:06
  • Thanks, @MarkPlotnick. That article covers this thoroughly: I believe the answer is that there are multiple page tables arranged in a page dictionary; in my example, each process (A and B) would have a different page table, probably copied during the initial fork(), and the page table would be modified during the COW. – notlesh Oct 06 '15 at 04:43

1 Answers1

4

One of the things the kernel does during a context switch between processes is to modify the MMU tables to remove entries that describe the previous process's address space and add entries that describe the next process's address space. Depending on the processor architecture, the kernel and possibly the configuration, this may be done by changing a processor register or by manipulating the page tables in memory.

Immediately after the fork operation, due to copy-on-write, the MMU tables for both processes have the same physical address for the virtual address 0x1234. Once again, these are two separate table, that happen to have identical entries for this particular virtual address.

The descriptor for this page has the read-only attribute. If a process tries to write (it doesn't matter whether it's A or B), this triggers a processor fault due to the permission violation. The kernel's page fault handler runs, analyzes the situation and decides to allocate a new physical page, copies the content of the read-only page to this new page, changes the calling process's MMU configuration so that 0x1234 now points to this freshly-allocated physical page with read-write attributes, and restarts the calling process on the instruction that caused the fault. This time the page is writable so the instruction will not trap.

Note that the page descriptor in the other process is not affected by this operation. In fact, it might be, because the kernel performs one more action: if the page is now only mapped in a single process, it's switched back to read-write, to avoid having to copy it later.

See also What happens after a page fault?

  • Thanks, this is a great answer. It's interesting that a page fault is necessary for subsequent memory writes; this would have to affect both processes. So forking to run an external program via execvp() or similar comes with a performance hit on the first write to any page owned by the process. Based on your link, it looks to be non-trivial, too (relatively speaking). – notlesh Oct 07 '15 at 14:42
  • 1
    @stephelton There's only a performance hit after forking until the child calls execve: after that there's no longer any copy-on-write shared memory between the parent and the child. – Gilles 'SO- stop being evil' Oct 07 '15 at 15:51
  • I thought that might be the case, but I couldn't find anything that explicitly said so. Is that a side effect of something else (e.g. freeing those pages)? – notlesh Oct 07 '15 at 21:38
  • @stephelton execve frees those pages in the child. – Gilles 'SO- stop being evil' Oct 07 '15 at 21:42
  • vfork was invented principally to avoid the costs of marking a large program's entire address space copy-on-write and then undoing that again upon execve. – zwol Dec 08 '15 at 19:48
  • @zwol Was it? The original fork wasn't copy-on-write: the first implementation on Unix copied the process to swap. I think it was already doing in-memory copies by the time vfork came along, but was it doing copy-on-write? AFAIK, vfork was invented to avoid the cost of copying the entire program's memory, and was made mostly obsolete (only a minor performance improvement and many downsides due to delicate use) by copy-on-write, but I don't have a reference for that. – Gilles 'SO- stop being evil' Dec 08 '15 at 21:54
  • @Gilles According to the FreeBSD manpage, vfork first appeared in 3BSD. I don't know if that OS had copy-on-write paging or not. I do know that the cost of copying page tables and flushing TLBs, just to set up the copy-on-write state, on a modern CPU is is sufficient to render vfork still a significant performance win, especially when the parent process is large. – zwol Dec 09 '15 at 14:59