2

When a C program is executed by the kernel—by execve(),

  • where does execve() call a special start-up routine crt0 which is called before the main function is called?

  • where does execve() call the main function?

I can't find them out in https://elixir.bootlin.com/linux/latest/source/fs/exec.c.

From Understanding the Linux Kernel, execve() internally looks for a linux_binfmt object whose load_binary() can load the executable file and call its load_binary() method to load it, and also load the dynamic linker to load and link the shared libraries used by the executable file. But the book doesn't say how execve() then calls the startup routine crt0 and then main() of the program from the executable file.

Thanks.

Tim
  • 101,790
  • 2
    Poor little execve() has no idea what you are speaking about. That a C language program begins execution (as far as the programmer is concerned) with a call to main() is a feature of the C language. Other languages have other conventions. It's the job of the linker to arrange it so that the startup code for the C language runtime calls main(). All execve() does is load the image and start running the process at the actual entry point of the executable, as specified by the linker which created the executable image. – AlexP Feb 02 '19 at 17:12
  • Where does execve() "start running the process at the actual entry point of the executable"? – Tim Feb 02 '19 at 17:16
  • retval = exec_binprm(bprm); (line 1819). – AlexP Feb 02 '19 at 17:18
  • Inside execve(), exec_binprm is a very high level function, which loads the executable file and dynamic linker among other things. That doesn't answer my question of how execve() then calls the startup routine crt0 and then main() of the program from the executable file. – Tim Feb 02 '19 at 19:08
  • It does not call the crt0. It start executing the process at the entry point which the linker set in the executable image. It does not have the slightest idea about crt0. – AlexP Feb 02 '19 at 19:43
  • What is "the entry point which the linker set in the executable image"? – Tim Feb 02 '19 at 19:49
  • Executable and Linkable Format (ELF); element e_entry in the ELF header. – AlexP Feb 02 '19 at 20:00
  • Does e_entry store the address of main() of the executable, or the startup routine crt0? – Tim Feb 02 '19 at 21:43
  • The link editor sets e_entry to the address of the first machine instruction to be executed. Where this machine instruction comes from depends on the programming language and runtime library. For programs written in a higher-level language, it is an entry point in the runtime library which is responsible for setting things up and calling the main program according to the conventions of that language. I confess that I have never been curious to find out what's the name of the library routine which calls main() for C language programs. – AlexP Feb 02 '19 at 23:33
  • @AlexP quite surprisingly, in glibc it's called __libc_start_main(). You can override it from a preloaded library and exec another binary instead. But that's not the entry point -- it's itself called from the _start function, which is the default entry point. –  Feb 03 '19 at 05:55
  • I suggest you learn more things about dynamic and static linking. – 炸鱼薯条德里克 Feb 03 '19 at 09:42

1 Answers1

5

Neither execve nor the kernel code do call the _start function (the entry point of an executable, whatever it's called), ever.

That's because they're running in different contexts; think as if they were running on different machines.

What happens is that the kernel arranges for the execve system call, upon returning to user mode, to have the IP (instruction pointer) register set to point to the beginning of the _start function, and the SP (stack pointer) register set to point to the beginning of the argv + env string list, so the effect from the point of view of user mode is as if someone had called the _start function as:

_start(argc, argv0, argv1, ... , NULL, env0, env1, ... NULL)

in a calling convention where all arguments are passed on the stack.

Of course, before that, the kernel had taken care of copying those argv + env at the right place, mapping the segment containing the _start function, etc.


Notice that the argv + env strings are all packed together in a single chunk, eg.

"prog\0arg1\0arg2\0VAR1=foo\0VAR2=bar\0"

The virtual addresses where that chunk begins and ends are accessible via the /proc/PID/stat file; quoting from the procfs(5) manpage:

(48) arg_start  %lu  (since Linux 3.5)  [PT]
        Address  above  which  program  command-line arguments
        (argv) are placed.

(49) arg_end %lu (since Linux 3.5) [PT] Address below program command-line arguments (argv) are placed.

Writing to that address will modify whatever appears in the ps output:

$ sleep 3600 3600 3600 3600 3600 3600 3600 &
[2] 4927
$ awk '{print $48,$49,$49-$48-1}' /proc/4927/stat
140735402952841 140735402952882 40
$ printf 'Somebody set up us the bomb Main screen turn on\0' | dd bs=1 count=40 of
=/proc/4927/mem seek=140735402952841 conv=notrunc
40+0 records in
40+0 records out
40 bytes copied, 0.000229779 s, 174 kB/s
$ ps 4927
  PID TTY      STAT   TIME COMMAND
 4927 pts/4    S      0:00 Somebody set up us the bomb Main screen
  • Thanks. (1) "the kernel had taked care of copying those argv + env at the right place, mapping the segment containing the _start function, etc." Do you mean execve() has to take care of that? (2) What is "the entry point of an executable"? Is it main() of the executable or the startup routine crt0? – Tim Feb 02 '19 at 21:39
  • No, execve is a system call. The execve func from glibc is simply a wrapper. 2. the entry point doesn't have to be named in any particular way. _start is its usual name. All it has to do is to be pointed to from the ELF header. In any case it could not be main(), because its arguments are passed in a different manner, as you could've gathered if you did read my answer.
  • –  Feb 02 '19 at 22:09
  • FWIW I don't think that the startup func was ever called crt0. That's usually the name of the object file that contains the _start func, and which should be linked in when building static executables. –  Feb 02 '19 at 22:21
  • Wouldn't the kernel point RIP(instruction pointer) to the final loader ELF's entry point? In case of a process execve a ELF-with-PT_INTERP or script-with-shebang or trigger some binfmt_misc rule? Especially when the specified userspace loader are recursively defined. – 炸鱼薯条德里克 Feb 03 '19 at 00:38
  • @炸鱼薯条德里克 in the case of an ELF with interpreter it's the interpreter which is actually executed and the execve will return pointing to its _start function -- which could do then as it pleases (eg not run any code from the original file at all). –  Feb 03 '19 at 06:26
  • Understanding the Linux Kernel says execve() calls load_binary() of each linux_binfmt object, and load_binary() " Invokes the start_thread( ) macro to modify the values of the User Mode registers eip and esp saved on the Kernel Mode stack, so that they point to the entry point of the dynamic linker and to the top of the new User Mode stack, respectively". – Tim Feb 03 '19 at 13:44
  • You wrote "What happens is that the kernel arranges for the execve system call, upon returning to user mode, to have the IP (instruction pointer) register set to point to the beginning of the _start function, and the SP (stack pointer) register set to point to the beginning of the argv + env string list". Which one is correct? – Tim Feb 03 '19 at 13:45
  • https://unix.stackexchange.com/questions/498430/does-execve-set-up-registers-to-invoke-dynamic-linker-or-the-executable-to-b – Tim Feb 03 '19 at 13:49
  • @Tim I don't care what that book says. I believe that my description is accurate, with the addition that when the ELF binary has a pt_interp in the header, it's the interpreter which is executed instead, ie the IP will be pointed to the interpreter's entry point (and the entry point of the original binary will be saved in the AUX vector); but I'm not going to add that to the answer, because it's only cluttering the point. –  Feb 03 '19 at 14:26
  • Thanks. How does execve() invoke the dynamic linker to load and link the shared libraries then? – Tim Feb 03 '19 at 15:10
  • execve() doesn't deal with shared libraries. It simply mmap the interpreter and original ELF then point IP to interpreter (or recursively). – 炸鱼薯条德里克 Feb 04 '19 at 01:34