The shell is just a program, although it plays an important role in the system. Bash, and I would assume most other common shells, are implemented in C. The two most important native C system calls that are used in the creation of subprocesses are fork()
and exec()
. These functions are usually implemented in higher level languages too, including shell.
fork()
"Fork" creates a duplicate copy of the calling process as its child. This is how virtually all processes on the system except for the first one (init) begin: as copies of the process which started them. Shell language doesn't actually have a fork
function, but it does include syntax to generate subshells, which are the same thing.
exec()
There isn't actually an exec()
call in C, but it colloquially refers to a group of related functions; you can see the list with man 3 exec
, which usually begins:
The exec() family of functions replace the current process image with a new process image...
And that is exactly what it does: replaces defining parts of the current process's memory stack with new stuff loaded from an executable file (e.g., /usr/bin/ls
). This is why fork()
is necessary first in the creation of a new process -- otherwise, the calling process ceases to be what it was and becomes something else instead; no new process would actually be created.
That may sound at first like an absurd and inefficient way to do things: why not just have a command that creates a new process from scratch? In fact, that would probably not be as efficient for a few reasons:
The "copy" produced by fork()
is a bit of an abstraction, since the kernel uses a copy-on-write system; all that really has to be created is a virtual memory map. If the copy then immediately calls exec()
, most of the data that would have been copied if it had been modified by the process's activity never actually has to be copied/created because the process doesn't do anything requiring its use.
Various significant aspects of the child process (e.g., its environment) do not have to be individually duplicated or set based on a complex analysis of the context, etc. They're just assumed to be the same as that of the calling process, and this is the fairly intuitive system we are familiar with.
For a detailed discussion of exactly what it means to "copy the environment" to the spawned child process, see my answer here.
If so, what about built-in commands like cd?
These are, again, just implemented in C. chdir()
, like fork()
and exec()
, is part of the Unix platform extensions to standard C and what underlies the shell's cd
command. From man 2 chdir
:
chdir() changes the current working directory of the calling process to the directory specified in path.
This does not require a subprocess -- it affects the caller. The shell is an interactive runtime interpreter, meaning it executes code written in the shell language as you feed it commands. It mostly does not need to execute a new process to do this, it does it as itself.