1

I want to know a better way of "naming things" (regarding memory management in linux), to avoid a reader of a writting of mine to misunderstood something, but also avoid long-phrases each time.

  • I call file-backed to a page that, in case of reclaim, can be just discarded because their contents can be retrieved from disk again. However, when focusing in that sense of "file-backed = can be just discarded in case of page reclaim", I don't know if there are special cases where pages can be discarded in case of page reclaim, but still cannot be considered file-backed. Consider the special zero-page: it cannot be reclaim though, but its contents are "constant". If a virtual page is mapped to the zero-page, you can safely "unmap" it, because you can "map it again" without losing information. Is there cases where the assertion "can be discarded in case of page reclaim" doesn't match well with the idea of file-backed page? Or there's actually an if and only if between both concepts?
  • Distinguish between virtual memory as in the whole addressable range, with virtual memory as in the currently existing virtual pages.
  • Regarding "mapping": if I say, mapping a page, what does it mean, creating a new virtual page and put it somewhere within the whole addressable range, or by "page mapping" one means: "associating a virtual page with a page in RAM after a page fault"? When a page is swapped out and then moved to RAM later again, is it correct to say "page mapping" to refer to these second situation where the virtual page, that was previously mapped to a page in swap, is now pointing to the new RAM page?
  • Physical page: does it exclusively mean a page in RAM, or a page that have a physical presence somewhere (RAM or disk)?
  • I want to distinguish between a virtual page that is "mapped" to a page that lives somewhere, with the page itself that lives somewhere. I like to use "unused page" to mean a virtual page mapped with no one, and so the situation I'm describing here is the opposite of that. But if I say "used page", I don't know if that can be understood as "recently referenced" or something like that. For example, a virtual page mapped to the zero-page is not a physical page, the zero-page is, but I don't know how to proper name both concepts.
  • I want to distinguish between a page that is meant to be shared with a page that is actually shared. For example, a page that is mapped to the .text section of the binary itself. The page is meant to be shared or potentially shared (in other words, is not meant to be private). However, if it's actually shared or not depends if there's more than on process executing the same binary. Same with "private": a page can be private, or shared but meant to be private. For example, CoW virtual pages like those mapped to the zero-page or inherited from the parent process after a fork: both are shared but meant to be private. I want to know how to distinguish both situations with a name.
  • Anonymous versus file-backed page: yet another cause of confusion. Consider the following situation: a dynamically linked library is virtually mapped to a proccess when run. No RAM-pages has been allocated yet. The .so file has a 8kB .data section with some global variables. Some of the global variables have been accessed, and then both pages are RAM-allocated. These are truly file-backed pages yet (in case of page reclaim, they can be discarded instead of moved to swap). Now, some variables are modified, causing the two pages to detach from the file, and so become anonymous (and so they can no longer be discarded). However, if I cat smaps I will see the corresponding address range refering to the file (inode different from 0, and the path of the .so file shown); however, both pages are anonymous now (the Anonymous field of the smaps output will be equal to 8 kB). Here, the "address-range" is refered to a file, but however, none of its pages are file-backed now. Is there a way to distinguish between a page or a range that "refers to a file", or that is "comes from a file", or that they were "file-backed" at some point in the past, with pages that are file-backed right now?
Kusalananda
  • 333,661
ABu
  • 556
  • 1
    Several of these terms have multiple senses, and the intended sense is usually clear from context. – Barmar Aug 11 '23 at 20:08

1 Answers1

1
  • file-backed means that the page is mapped to a page of a file. File-backed pages are created by calling mmap() on a file descriptor. This is done automatically by exec when loading a process (the text segment is backed by the executable file) and by the dynamic linker when linking shared libraries.

  • To indicate whether a page can be discarded when reclaimed, the terms are dirt and clean. A dirty page has changes since it was last page in from disk, and these need to be written back to disk. A clean page can simply be discarded. This is independent of whether it's file-backed; pages backed by swap can also be dirty or clean. And file-backed pages can be dirty, that's how you update a file using memory mapping.

  • virtual memory can be used in a number of ways. Its basic definition is the automatic memory management system that allows a process to access more data than physical RAM. This is distinguised from swapping, where there can be multiple processes whose combined memory is larger than RAM, but individual processes are still limited to physical RAM; this is not common these days, but worked by copying the process's entire address space between disk and RAM whenever switching processes.

  • virtual memory is also used to refer to a process's virtual address space, as opposed to the physical RAM on which it's implemented. User-mode processes generally only have access to virtual memory, physical memory is addressed by the memory management components of the OS and CPU.

  • And virtual memory is also sometimes used to refer to all the memory that's managed by the virtual memory subsystem. Back in the days when disk space wasn't as cheap as it is now, it wasn't uncommon to "run out of virtual memory" -- the swap partition wasn't large enough to hold all non-file-backed memory.

  • mapping refers to relating a page of virtual memory with some disk page. For file-backed pages, it's mapped to the corresponding page of the file. Anonymous pages are mapped to swap pages. Since the latter is generally transparent, we mostly use this term to refer to file-backed pages, typically those created using mmap().

  • physical page usually means a page of RAM, to distinguish it from a virtual page.

  • "I want to distinguish between a virtual page that is "mapped" to a page that lives somewhere, with the page itself that lives somewhere." I'm not sure there's a good term for that.

  • "I want to distinguish between a page that is meant to be shared with a page that is actually shared." I think the term would be shareable. For the pages that are initially shared after fork(), I don't think there's anything more specific than private. This implies COW if they happen to be initially shared.

  • "Anonymous versus file-backed page". The .data section can be mapped with MAP_PRIVATE. As above, this implies COW. So it's initially file-backed, but when you dirty it, it will be mapped to swap as an anonymous page.

Barmar
  • 9,927
  • Ok. You said its wrong to say "mapping" or "mapped" to refer to the association virtual => physical. However, when a CPU must do a virtual to physical address translation, one usually talks in term of whether the page is mapped, or unmapped (unmapped = page fault at first access). Or am I wrong here? If I'm not wrong here, so, is there any term to refer more explicitely to that second case? – ABu Aug 11 '23 at 23:22
  • Regarding anonymous and "file-backed": another way to phrase it is that an anonymous page is one that is initially zero-filled (initially associated with the zero-page), while, if it's associated with a file, the it's initially filled (when accessed) with the contents of the file. So if file-backed is reserved for clean pages associated with a file, is there any name to refer to a page that is associated with a file, be it dirty or clean? – ABu Aug 11 '23 at 23:31
  • 1
    As I said above, these terms are used in multiple ways. "mapped" has different senses. – Barmar Aug 12 '23 at 07:15
  • 1
    An anonymous page doesn't have to be initially zero. The data segment is anonymous, but it's initialized from the .data section of the executable. Zero-filled pages are used when allocating pages dynamically, e.g. the heap. – Barmar Aug 12 '23 at 07:17
  • I thought the data segment was private rather than anonymous. It starts as file-backed, and become anonymous after the first write. – ABu Aug 12 '23 at 08:39
  • 1
    Private == anonymous, since it doesn't get written back to a file, it gets written to swap. – Barmar Aug 12 '23 at 08:40
  • The .text section of a binary is private, but you won't say it's anonymous, right? You would say it's file-backed (unless a debugger adds a trap to set a breakpoint, in whose case the page is no longer file-backed but anonymous). – ABu Aug 12 '23 at 08:49
  • 1
    These terms don't have single, unambiguous meanings. They're used flexibly. – Barmar Aug 12 '23 at 08:51
  • Yeah, I'm aware that I'm trying to formalize or give too much structure to the vocabulary as if this were mathematics, but on the other hands, this is quite of a mess. We always cry why people wants to stay in the beautiful world of OO abstractions and refuses to go a bit low-level: yeah, every time you try, you hit the wall of vocabulary. And with time, you learn that "low-level" = "reasons to be afraid of messing everything up" because you are never sure if you misunderstood something or not. – ABu Aug 12 '23 at 08:57
  • 1
    Unfortunately, this is the way it is. Academics will choose specific meanings as needed for their reports, but in general they're not used very formally. – Barmar Aug 12 '23 at 08:59