NOTE: I'm going to assume that your machine has a memory mapping unit (MMU). There is a Linux version (µClinux) that doesn't require an MMU, and this answer doesn't apply there.
What is an MMU? It's hardware—part of the processor and/or memory controller. Understanding shared library linking doesn't require you to understand exactly how an MMU works, just that an MMU allows there to be a difference between logical memory addresses (the ones used by programs) and physical memory addresses (the ones actually present on the memory bus). Memory is broken down into pages, typically 4K in size on Linux. With 4k pages, logical addresses 0–4095 are page 0, logical addresses 4096–8191 are page 1, etc. The MMU maps those to physical pages of RAM, and each logical page can be typically mapped to 0 or 1 physical pages. A given physical page can correspond to multiple logical pages (this is how memory is shared: multiple logical pages correspond to the same physical page). Note this applies regardless of OS; it's a description of the hardware.
On process switch, the kernel changes the MMU page mappings, so that each process has its own space. Address 4096 in process 1000 can be (and usually is) completely different from address 4096 in process 1001.
Pretty much whenever you see an address, it is a logical address. User space programs hardly ever deal with physical addresses.
Now, there are multiple ways to build libraries as well. Let's say a program calls the function foo()
in the library. The CPU doesn't know anything about symbols, or function calls really—it just knows how to jump to a logical address, and execute whatever code it finds there. There are a couple of ways it could do this (and similar things apply when a library accesses its own global data, etc.):
- It could hard-code some logical address to call it at. This requires that the library always be loaded at the exact same logical address. If two libraries require the same address, dynamic linking fails and you can't launch the program. Libraries can require other libraries, so this basically requires every library on the system to have unique logical addresses. It's very fast, though, if it works. (This is how a.out did things, and the kind of set up that prelinking does, sort of).
- It could hard-code a fake logical address, and tell the dynamic linker to edit in the proper one when loading the library. This costs a fair bit of time when loading the libraries, but after that it is very fast.
- It could add a layer of indirection: use a CPU register to hold the logical address the library is loaded at, and then access everything as an offset from that register. This imposes a performance cost on each access.
Pretty much no one uses #1 anymore, at least not on general-purpose systems. Keeping that unique logical address list is impossible on 32-bit systems (there aren't enough to go around) and an administrative nightmare on 64-bit systems. Pre-linking sort of does this, though, on a per-system basis.
Whether #2 or #3 is used depends on if the library was built with GCC's -fPIC
(position independent code) option. #2 is without, #3 is with. Generally, libraries are built with -fPIC
, so #3 is what happens.
For more details, see the Ulrich Drepper's How to Write Shared Libraries (PDF).
So, finally, your question can be answered:
- If the library is built with
-fPIC
(as it almost certainly should be), the vast majority of pages are exactly the same for every process that loads it. Your processes a
and b
may well load the library at different logical addresses, but those will point to the same physical pages: the memory will be shared. Further, the data in RAM exactly matches what is on disk, so it can be loaded only when needed by the page fault handler.
- If the library is built without
-fPIC
, then it turns out that most pages of the library will need link edits, and will be different. Therefore, they must be separate physical pages (as they contain different data). That means they're not shared. The pages don't match what is on disk, so I wouldn't be surprised if the entire library is loaded. It can of course subsequently be swapped out to disk (in the swapfile).
You can examine this with the pmap
tool, or directly by checking various files in /proc
. For example, here is a (partial) output of pmap -x
on two different newly-spawned bc
s. Note that the addresses shown by pmap are, as typical, logical addresses:
pmap -x 14739
Address Kbytes RSS Dirty Mode Mapping
00007f81803ac000 244 176 0 r-x-- libreadline.so.6.2
00007f81803e9000 2048 0 0 ----- libreadline.so.6.2
00007f81805e9000 8 8 8 r---- libreadline.so.6.2
00007f81805eb000 24 24 24 rw--- libreadline.so.6.2
pmap -x 17739
Address Kbytes RSS Dirty Mode Mapping
00007f784dc77000 244 176 0 r-x-- libreadline.so.6.2
00007f784dcb4000 2048 0 0 ----- libreadline.so.6.2
00007f784deb4000 8 8 8 r---- libreadline.so.6.2
00007f784deb6000 24 24 24 rw--- libreadline.so.6.2
You can see that the library is loaded in multiple parts, and pmap -x
gives you details on each separately. You'll notice that the logical addresses are different between the two processes; you'd reasonably expect them to be the same (since its the same program running, and computers are usually predictable like that), but there is a security feature called address space layout randomization that intentionally randomizes them.
You can see from the difference in size (Kbytes) and resident size (RSS) that the entire library segment has not been loaded. Finally, you can see that for the larger mappings, dirty is 0, meaning it corresponds exactly to what is on disk.
You can re-run with pmap -XX
, and it'll show you—depending on the kernel version you're running, as -XX output varies by kernel version—that the first mapping has a Shared_Clean
of 176, which exactly matches the RSS
. Shared
memory means the physical pages are shared between multiple processes, and since it matches the RSS, that means all of the library that is in memory is shared (look at the See Also below for further explanation of shared vs. private):
pmap -XX 17739
Address Perm Offset Device Inode Size Rss Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty Referenced Anonymous AnonHugePages Swap KernelPageSize MMUPageSize Locked VmFlagsMapping
7f784dc77000 r-xp 00000000 fd:00 1837043 244 176 19 176 0 0 0 176 0 0 0 4 4 0 rd ex mr mw me sd libreadline.so.6.2
7f784dcb4000 ---p 0003d000 fd:00 1837043 2048 0 0 0 0 0 0 0 0 0 0 4 4 0 mr mw me sd libreadline.so.6.2
7f784deb4000 r--p 0003d000 fd:00 1837043 8 8 8 0 0 0 8 8 8 0 0 4 4 0 rd mr mw me ac sd libreadline.so.6.2
7f784deb6000 rw-p 0003f000 fd:00 1837043 24 24 24 0 0 0 24 24 24 0 0 4 4 0 rd wr mr mw me ac sd libreadline.so.6.2
See Also
-fPIC
usage has completely changed some time ago)? – Hauke Laging Feb 22 '14 at 15:511.
was correct. Also, I made a few changes to what you did—"start address" is technical jargon, I likely caused confusion by putting "logical" in the middle. I changed it to get rid of the jargon. Also, pages are equivalent to those addresses, AFAIK it's not possible for those addresses to ever be a different page. I tried again, swapping the order, hopefully that's clearer. – derobert Feb 26 '14 at 17:29