0

As we know, any executable file, which is running, is loaded into RAM.

Also, we have two kinds of libs: static link library and dynamic link library.

The two kinds of libs should be loaded into RAM too while they are needed.

As I know, we have two ways to load the dynamic library:

  1. link it while compiling, such as g++ -lsofile
  2. load dynamically in the code, we have dlopen to do this

I've post this question but I can't still make sure that we could list all lib files. For the first case above, I think we can get the link file with ldd, or check /proc/{PID}/maps. But for the second case, I'm thinking if I can get the link files with some method, here is an example:

void SomeModule()
{
    //dlopen here to link mysofile
}

int main()
{
    if (user_enter == 'a')
    {
        printf("hello world");
    }
    else
    {
        SomeModule();
    }
}

In this example, when we execute it and type always a, the dlopen will never be called, so mysofile will never be linked, which means that mysofile will never be loaded into RAM. Am I right?

If so, how can I get the necessary lib files of the executable file except reading the source code?

Kusalananda
  • 333,661
Yves
  • 3,291
  • 1
    It doesn't make sense to "load" static libraries to anything. They're included in the executable when compiling, and the entire executable is loaded in to memory. – muru Mar 09 '18 at 08:50
  • @muru You are right, but I'm caring about the dynamic link library for now... – Yves Mar 09 '18 at 09:27

2 Answers2

4

As we know, any executable file, which is running, is loaded into RAM.

Wrong !

an executable file is mapped in the virtual address space of processes running it, by the virtual memory subsystem of the kernel. The physical RAM is managed by the kernel only. Read Operating Systems: Three Easy Pieces for more.

Not all of the code segment of that executable file gets paged (not loaded!) into RAM. In particular, a large piece of code which gets never used (e.g. because it contains some large function which is never called) won't go into RAM. Read about paging and the page cache.

Sometimes, there is not enough physical RAM to conveniently deal with all required pages. In that situation you observe thrashing.

the dynamic linker (see ld-linux(8)) and also dlopen(3) uses mmap(2) to memory map some segments from the shared library. So it does not load all the code segment of the plugin into RAM. Read also Drepper's How To Write Shared Libraries paper.

when we execute it and type always a, the dlopen will never be called, so mysofile will never be linked, which means that mysofile will never be loaded into RAM.

There is absolutely no way in general to predict what future shared libraries would be used and dlopen-ed. Think of the following two scenarii:

  • a long lasting program (perhaps your browser) asks its user to get some shared library (maybe downloading it from the network) and then dlopen it.

  • a process is generating some C code in a temporary file /tmp/emittedcode.c, compile (by forking an appropriate process running some gcc -O -Wall -fPIC /tmp/emittedcode.c -shared -o /tmp/emittedcode.so) that file into a temporary plugin /tmp/emittedcode.so and dlopen-s that temporary plugin (of course later dlsym-ing appropriate symbols there).

I am quite fond of the second approach. Notice that compiling to C is a well established habit. And current compilers are fast enough to even enable doing that in some REPL interaction.

BTW, on a Linux desktop, a process may dlopen a lot of shared objects i.e. plugins (at least hundreds of thousands, and probably millions). See my manydl.c example (which generate "random" C code in temporary files and repeat).

PS. Be also aware of the Halting Problem, related to the theoretical impossibility of predicting all future dlopen-ed paths.

1

You’re right, if dlopen is never called, the target library is never loaded into (the process’) memory.

Determining the necessary libraries without reading the source code feels like a variant of the halting problem. You could use some heuristics: if the program doesn’t link to libdl, then it can’t use dlopen; if it does, then you can use strace (see How to find out the dynamic libraries executables loads when run?) or try to figure out the arguments to dlopen using static analysis. But the program could include libdl directly (either through static linking, or by building the code); and since the dynamic linker isn’t magic, there’s nothing preventing a program from re-implementing it itself, so you can’t be absolutely sure you’ve caught all the libraries needed using these heuristics. Perhaps there are programs which figure out they’re being traced, and skip library-loading...

The only sure way of listing all the libraries required is to read the source code.

Stephen Kitt
  • 434,908
  • Reading the source code is not enough. See my answer. – Basile Starynkevitch Mar 09 '18 at 11:27
  • @Basile, I don’t think your answer contradicts that part — you need to read all the source code of everything that’s run, which would include the code you’re generating or the code for a plugin... – Stephen Kitt Mar 09 '18 at 11:32
  • No, in practice you never read generated C code. You don't have time for that. And it is proven that you cannot statically predict all the calls of a function (such as dlopen) in a program (that is equivalent to solving the Halting Problem...) – Basile Starynkevitch Mar 09 '18 at 11:34
  • You read the generator’s source code... – Stephen Kitt Mar 09 '18 at 11:35
  • That is not enough. – Basile Starynkevitch Mar 09 '18 at 11:36
  • Note that I referenced the halting problem too ;-). How can your generator generate code that’s not derived from the code it contains, or its inputs? If you’re using randomness, then all bets are off, yes, but apart from that? – Stephen Kitt Mar 09 '18 at 11:39
  • It could use randomness. It probably generates code that depends not only on some current input, but also on the previous state of the generating process – Basile Starynkevitch Mar 09 '18 at 11:41
  • ... which is an input... You’re eluding the question. The ability to load a library needs to be introduced at some point, it can’t spontaneously appear in the generated code (again, ignoring randomness). – Stephen Kitt Mar 09 '18 at 11:41