What are the minimum root filesystem applications that are required to fully boot linux?

Question

It's a question about user space applications, but hear me out!

Three "applications", so to speak, are required to boot a functional distribution of Linux:

Bootloader - For embedded typically that's U-Boot, although not a hard requirement.
Kernel - That's pretty straightforward.
Root Filesystem - Can't boot to a shell without it. Contains the filesystem the kernel boots to, and where init is called form.

My question is in regard to #3. If someone wanted to build an extremely minimal rootfs (for this question let's say no GUI, shell only), what files/programs are required to boot to a shell?

Define minimal. You can use just a single executable with nothing else as explained at: http://superuser.com/a/991733/128124 Just that it cannot every exit or else panic, so you need an infinite loop or long sleep. Similar question: http://unix.stackexchange.com/questions/17122/is-it-possible-to-install-the-linux-kernel-alone — Ciro Santilli OurBigBook.com, Jun 30 '16 at 16:33

score 41 · Accepted Answer · edited Apr 13 '17 at 12:36

That entirely depends on what services you want to have on your device.

Programs

You can make Linux boot directly into a shell. It isn't very useful in production — who'd just want to have a shell sitting there — but it's useful as an intervention mechanism when you have an interactive bootloader: pass init=/bin/sh to the kernel command line. All Linux systems (and all unix systems) have a Bourne/POSIX-style shell in /bin/sh.

You'll need a set of shell utilities. BusyBox is a very common choice; it contains a shell and common utilities for file and text manipulation (cp, grep, …), networking setup (ping, ifconfig, …), process manipulation (ps, nice, …), and various other system tools (fdisk, mount, syslogd, …). BusyBox is extremely configurable: you can select which tools you want and even individual features at compile time, to get the right size/functionality compromise for your application. Apart from sh, the bare minimum that you can't really do anything without is mount, umount and halt, but it would be atypical to not have also cat, cp, mv, rm, mkdir, rmdir, ps, sync and a few more. BusyBox installs as a single binary called busybox, with a symbolic link for each utility.

The first process on a normal unix system is called init. Its job is to start other services. BusyBox contains an init system. In addition to the init binary (usually located in /sbin), you'll need its configuration files (usually called /etc/inittab — some modern init replacement do away with that file but you won't find them on a small embedded system) that indicate what services to start and when. For BusyBox, /etc/inittab is optional; if it's missing, you get a root shell on the console and the script /etc/init.d/rcS (default location) is executed at boot time.

That's all you need, beyond of course the programs that make your device do something useful. For example, on my home router running an OpenWrt variant, the only programs are BusyBox, nvram (to read and change settings in NVRAM), and networking utilities.

Unless all your executables are statically linked, you will need the dynamic loader (ld.so, which may be called by different names depending on the choice of libc and on the processor architectures) and all the dynamic libraries (/lib/lib*.so, perhaps some of these in /usr/lib) required by these executables.

Directory structure

The Filesystem Hierarchy Standard describes the common directory structure of Linux systems. It is geared towards desktop and server installations: a lot of it can be omitted on an embedded system. Here is a typical minimum.

/bin: executable programs (some may be in /usr/bin instead).
/dev: device nodes (see below)
/etc: configuration files
/lib: shared libraries, including the dynamic loader (unless all executables are statically linked)
/proc: mount point for the proc filesystem
/sbin: executable programs. The distinction with /bin is that /sbin is for programs that are only useful to the system administrator, but this distinction isn't meaningful on embedded devices. You can make /sbin a symbolic link to /bin.
/mnt: handy to have on read-only root filesystems as a scratch mount point during maintenance
/sys: mount point for the sysfs filesystem
/tmp: location for temporary files (often a tmpfs mount)
/usr: contains subdirectories bin, lib and sbin. /usr exists for extra files that are not on the root filesystem. If you don't have that, you can make /usr a symbolic link to the root directory.

Device files

Here are some typical entries in a minimal /dev:

console
full (writing to it always reports “no space left on device”)
log (a socket that programs use to send log entries), if you have a syslogd daemon (such as BusyBox's) reading from it
null (acts like a file that's always empty)
ptmx and a pts directory, if you want to use pseudo-terminals (i.e. any terminal other than the console) — e.g. if the device is networked and you want to telnet or ssh in
random (returns random bytes, risks blocking)
tty (always designates the program's terminal)
urandom (returns random bytes, never blocks but may be non-random on a freshly-booted device)
zero (contains an infinite sequence of null bytes)

Beyond that you'll need entries for your hardware (except network interfaces, these don't get entries in /dev): serial ports, storage, etc.

For embedded devices, you would normally create the device entries directly on the root filesystem. High-end systems have a script called MAKEDEV to create /dev entries, but on an embedded system the script is often not bundled into the image. If some hardware can be hotplugged (e.g. if the device has a USB host port), then /dev should be managed by udev (you may still have a minimal set on the root filesystem).

Boot-time actions

Beyond the root filesystem, you need to mount a few more for normal operation:

procfs on /proc (pretty much indispensible)
sysfs on /sys (pretty much indispensible)
tmpfs filesystem on /tmp (to allow programs to create temporary files that will be in RAM, rather than on the root filesystem which may be in flash or read-only)
tmpfs, devfs or devtmpfs on /dev if dynamic (see udev in “Device files” above)
devpts on /dev/pts if you want to use [pseudo-terminals (see the remark about pts above)

You can make an /etc/fstab file and call mount -a, or run mount manually.

Start a syslog daemon (as well as klogd for kernel logs, if the syslogd program doesn't take care of it), if you have any place to write logs to.

After this, the device is ready to start application-specific services.

How to make a root filesystem

This is a long and diverse story, so all I'll do here is give a few pointers.

The root filesystem may be kept in RAM (loaded from a (usually compressed) image in ROM or flash), or on a disk-based filesystem (stored in ROM or flash), or loaded from the network (often over TFTP) if applicable. If the root filesystem is in RAM, make it the initramfs — a RAM filesystem whose content is created at boot time.

Many frameworks exist for assembling root images for embedded systems. There are a few pointers in the BusyBox FAQ. Buildroot is a popular one, allowing you to build a whole root image with a setup similar to the Linux kernel and BusyBox. OpenEmbedded is another such framework.

Wikipedia has an (incomplete) list of popular embedded Linux distributions. An example of embedded Linux you may have near you is the OpenWrt family of operating systems for network appliances (popular on tinkerers' home routers). If you want to learn by experience, you can try Linux from Scratch, but it's geared towards desktop systems for hobbyists rather than towards embedded devices.

A note on Linux vs Linux kernel

The only behavior that's baked into the Linux kernel is that the first program that's launched at boot time. (I won't get into initrd and initramfs subtleties here.) This program, traditionally called init, has process ID 1 and has certain privileges (immunity to KILL signals) and responsibilities (reaping orphans). You can run a system with a Linux kernel and start whatever you want as the first process, but then what you have is an operating system based on the Linux kernel, and not what is normally called “Linux” — Linux, in the common sense of the term, is a Unix-like operating system whose kernel is the Linux kernel. For example, Android is an operating system which is not Unix-like but based on the Linux kernel.

Excellent answer. I only mentioned booting into Linux in the title b/c that's what likely will be searched for, so great addition about Linux vs Linux Kernel, that needs to be more widespread knowledge. — MDMoore313, May 28 '14 at 20:48
@BigHomie Remember, the Free Software Foundation wants us all to call it GNU/Linux, since on most (all?) "Linux distros" the software is GNU, even though the kernel is Linux (hence GNU/Linux). — BenjiWiebe, May 29 '14 at 00:40
Meh, ain't nobody got time for that. Then my distro should be called Busybox/Linux?? I know I know, it's not you its Stallworth, just venting ;) — MDMoore313, May 29 '14 at 00:48
@BenjiWiebe Or GNU/X11/Apache/Linux/TeX/Perl/Python/FreeCiv. Apart from RMS, everybody calls it “Linux”. — Gilles 'SO- stop being evil', May 29 '14 at 01:04
Under "Directory structure" needs an edit for "some may be in ..." -- could be /sbin or /usr/bin? — Jeff Schaller, Apr 05 '16 at 11:50
@JeffSchaller You mean http://unix.stackexchange.com/suggested-edits/27665 ? Yeah, that should have been /usr/bin. — Gilles 'SO- stop being evil', Apr 05 '16 at 11:53

score 6 · Answer 2 · answered May 28 '14 at 19:54

6

All you need is one statically linked executable, placed on the filesystem, in isolation. You do not need any other files. That executable is the init process. It can be busybox. That gives you a shell and a host of other utilities, all in itself. You can go to a fully functioning system just by executing commands manually in busybox to mount the root filesystem read-write, create /dev nodes, exec real init, etc.

answered May 28 '14 at 19:54

Kuba hasn't forgotten Monica

165

Yeah, I knew busybox was comin'. Lets see if anything else shows up. – MDMoore313 May 28 '14 at 19:56

score 5 · Answer 3 · answered May 30 '14 at 17:50

If you do not need any shell utilities, a statically linked mksh binary (e.g. against klibc – 130K on Linux/i386) will do. You need a /linuxrc or /init or /sbin/init script that just calls mksh -l -T!/dev/tty1 in a loop:

#!/bin/mksh
while true; do
    /bin/mksh -l -T!/dev/tty1
done

The -T!$tty option is a recent addition to mksh that tells it to spawn a new shell on the given terminal and wait for it. (Before that, there was only -T- to dæmonise a programm and -T$tty to spawn on a terminal but not wait for it. This was not so nice.) The -l option simply tells it to run a login shell (which reads /etc/profile, ~/.profile and ~/.mkshrc).

This assumes your terminal is /dev/tty1, substitute. (With more magic, the terminal can automatically be found out. /dev/console will not give you full job control.)

You need a few files in /dev for this to work:

/dev/console
/dev/null
/dev/tty
/dev/tty1

Booting with the kernel option devtmpfs.mount=1 eliminates the need for a filled /dev, just let it be an empty directory (suitable for use as a mountpoint).

You'll normally want to have some utilities (from klibc, busybox, beastiebox, toybox or toolbox), but they are not really needed.

You may want to add a ~/.mkshrc file, which sets up $PS1 and some basic shell aliases and functions.

I once made an 171K compressed (371K uncompressed) initrd for Linux/m68k using mksh (and its sample mkshrc file) and klibc-utils only. (This was before -T! was added to the shell, though, so it spawned the login shell on /dev/tty2 instead and echo'd a message to the console telling the user to switch terminals.) It works fine.

This is a really bare minimum setup. The other answers provide excellent advice towards somewhat more featured systems. This is a real special-case thing.

Disclaimer: I'm the mksh developer.

This is a great answer, thanks for sharing and also thanks for mksh. — JoshuaRLi, Jan 17 '19 at 00:25

Ciro Santilli OurBigBook.com · Answer 4 · 2021-10-16T09:33:59.127

Minimal init hello world program step-by-step

As shown in this answer, all you need is a single statically linked ELF file without even the standard library, therefore a filesystem with a single file.

Compile a hello world without any dependencies that ends in an infinite loop. init.S:

.global _start
_start:
    mov $1, %rax
    mov $1, %rdi
    mov $message, %rsi
    mov $message_len, %rdx
    syscall
    jmp .
    message: .ascii "FOOBAR FOOBAR FOOBAR FOOBAR FOOBAR FOOBAR FOOBAR\n"
    .equ message_len, . - message

We cannot use sys_exit, or else the kernel panics.

Then:

mkdir d
as --64 -o init.o init.S
ld -o init d/init.o
cd d
find . | cpio -o -H newc | gzip > ../rootfs.cpio.gz
ROOTFS_PATH="$(pwd)/../rootfs.cpio.gz"

This creates a filesystem with our hello world at /init, which is the first userland program that the kernel will run. We could also have added more files to d/ and they would be accessible from the /init program when the kernel runs.

Then cd into the Linux kernel tree, build is as usual, and run it in QEMU:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux
git checkout v4.9
make mrproper
make defconfig
make -j"$(nproc)"
qemu-system-x86_64 -kernel arch/x86/boot/bzImage -initrd "$ROOTFS_PATH"

And you should see a line:

FOOBAR FOOBAR FOOBAR FOOBAR FOOBAR FOOBAR FOOBAR

on the emulator screen! Note that it is not the last line, so you have to look a bit further up.

You can also use C programs if you link them statically:

#include <stdio.h>
#include <unistd.h>
int main() {
    printf("FOOBAR FOOBAR FOOBAR FOOBAR FOOBAR FOOBAR FOOBAR\n");
    sleep(0xFFFFFFFF);
    return 0;
}

with:

gcc -static init.c -o init

You can run on real hardware with a USB on /dev/sdX and:

make isoimage FDINITRD="$ROOTFS_PATH"
sudo dd if=arch/x86/boot/image.iso of=/dev/sdX

Great source on this subject: http://landley.net/writing/rootfs-howto.html It also explains how to use gen_initramfs_list.sh, which is a script from the Linux kernel source tree to help automate the process.

Minimal setup that gives you a shell

Buildroot is my favorite option, see discussion at: What is the smallest possible Linux implementation?

At this point you are basically obliged to deal with the standard library, what insane mind who would code a sh shell without a standard library? So you are better off just using some automation script to set all of that up.

Tested on Ubuntu 16.10, QEMU 2.6.1.