16

I was going through a tutorial on setting up a custom initramfs where it states:

The only thing that is missing is /init, the executable in the root of the initramfs that is executed by the kernel once it is loaded. Because sys-apps/busybox includes a fully functional shell, this means you can write your /init binary as a simple shell script (instead of making it a complicated application written in Assembler or C that you have to compile).

and gives an example of init as a shell script that starts with #!/bin/busybox sh

So far, I was under the impression that init is the main process that is launched and that all the other user space process are eventually children of init. However, in the given example, the first process is actually bin/busybox/ sh from which later init is spawned.

Is this a correct interpertation? If I were, for example, have a available interpreter available at that point, I could write init as a Python script etc.?

Ciro Santilli OurBigBook.com
  • 18,092
  • 4
  • 117
  • 102

2 Answers2

15

init is not "spawned" (as a child process), but rather exec'd like this:

# Boot the real thing.
exec switch_root /mnt/root /sbin/init

exec replaces the entire process in place. The final init is still the first process (pid 1), even though it was preceded with those in the Initramfs.

The Initramfs /init, which is a Busybox shell script with pid 1, execs to Busybox switch_root (so now switch_root is pid 1); this program changes your mount points so /mnt/root will be the new /.

switch_root then again execs to /sbin/init of your real root filesystem; thereby it makes your real init system the first process with pid 1, which in turn may spawn any number of child processes.

Certainly it could just as well be done with a Python script, if you somehow managed to bake Python into your Initramfs. Although if you don't plan to include busybox anyway, you would have to painstakingly reimplement some of its functionality (like switch_root, and everything else you would usually do with a simple command).

However, it does not work on kernels that do not allow script binaries (CONFIG_BINFMT_SCRIPT=y), or rather in such a case you'd have to start the interpreter directly and make it load your script somehow.

frostschutz
  • 48,978
  • / doesn't vanish into thin air - it is mounted over (though usually its contents are all deleted before it is to save memory). It is still there. switch_root does the syscall switchroot - which is what the kernel devs provided when they changed the boot process in kernel 2.6.something to require initramfs. It is the kernel that does the magic. – mikeserv Dec 13 '14 at 19:12
  • 1
    A switchroot syscall would indeed be news to me. Do you have a source for that? If you look at the switch_root.c source code, it seems to be quite a manual process, and the same as is described in Documentation/filesystems/ramfs-rootfs-initramfs.txt. Also if you delete everything and mount it over, it's pretty much vanished at this point, don't you think? – frostschutz Dec 13 '14 at 21:21
  • pivot_root, on the other hand, is a syscall. It's not used for switch_root though and can't be used without jumping through some hoops, and either way it matters none whatsoever for this answer, so I just removed it altogether. Too bad, I thought that magic and vanish into thin air worked really well... :-P – frostschutz Dec 13 '14 at 21:51
  • Well, maybe I got the wrong idea about switch_root - for which I'm sorry, and I thank you for showing me - but it doesn't vanish anything anyway. initramfs root persists and is always there for everybody - it is root. – mikeserv Dec 13 '14 at 21:54
  • 2
    As the docs you linked to say: But initramfs is rootfs: you can neither pivot_root rootfs, nor unmount it. Instead delete everything out of rootfs to free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs with the new root (cd /newmount; mount --move . /; chroot .), attach stdin/stdout/stderr to the new /dev/console, and exec the new init. – mikeserv Dec 13 '14 at 21:56
6

The exec syscall of the Linux kernel underestands shebangs natively

When the executed file starts with the magic bytes #!, they tell the kernel to use #!/bin/sh as:

  • do and exec system call
  • with executable /bin/sh
  • and with CLI argument: path to current script

This is exactly the same that happens when you run a regular userland shell script with:

./myscript.sh

If the file had started with the magic bytes .ELF instead of #!, the kernel would pick the ELF loader instead to run it.

More details at: Why do people write the #!/usr/bin/env python shebang on the first line of a Python script? | Stack Overflow

Once you have this in mind, it becomes easy to accept that /init can be anything that the kernel can execute, including a shell script, and also why /bin/sh will be the first executable in that case.

Here is a minimal runnable example for those that want to try it out: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/cbea7cc02c868711109ae1a261d01fd0473eea0b#custom-init

Ciro Santilli OurBigBook.com
  • 18,092
  • 4
  • 117
  • 102