13

Why does ls require a separate process for its execution? I know the reason why commands like cd can't be executed by forking mechanism but is there any harm if ls is executed without forking?

Mat
  • 52,586
crisron
  • 499
  • 2
    Although ls is an external programme, echo * or echo * .* (depending on shell options) does a pretty good job of listing files without forking. – gerrit Jan 10 '14 at 09:45
  • This is even better: printf "%s\n" * – Costa Jan 15 '14 at 07:23
  • Shell diversity note: tcsh has a builtin ls-F which acts like ls -F. It's there for efficiency. You always get -F which is usually a good idea. If you specify any other options it punts to the external command. –  May 04 '14 at 18:05

6 Answers6

18

The answer is more or less that ls is an external executable. You can see its location by running type -p ls.

Why isn't ls built into the shell, then? Well, why should it be? The job of a shell is not to encompass every available command, but to provide an environment capable of running them. Some modern shells have echo, printf, and their ilk as builtins, which don't technically have to be builtins, but are made so for performance reasons when they are run repeatedly (primarily in tight loops). Without making them builtins, the shell would have to fork and exec a new process for each call to them, which could be extremely slow.

At the very least, running ls, an external executable, requires running one of the exec family of system calls. You could do this without forking, but it would replace the primary shell that you are using. You can see what happens in that instance by doing the following:

exec ls; echo "this never gets printed"

Since your shell's process image is replaced, the current shell is no longer accessible after doing this. For the shell to be able to continue to run after running ls, the command would have to be built into the shell.

Forking allows the replacement of a process that is not your primary shell, which means you can continue to run your shell afterwards.

Chris Down
  • 125,559
  • 25
  • 270
  • 266
  • 1
    I think he is asking why ls(1) isn't a builtin feature of shells, which someone would need to explain how different vendors have different options for ls(1) and able to query different stuff from the filesystem, etc. And also the ups, and mostly downs of having it 'builtin' the shell. – llua Jan 10 '14 at 03:55
  • @llua I added some information about that, and the exception cases of echo, printf, etc. – Chris Down Jan 10 '14 at 04:00
  • It's not always clear why some things are builtins and others are not. For example, why is cd not an external executable? – Faheem Mitha Jan 10 '14 at 08:04
  • @FaheemMitha There is an external cd executable in POSIX-compliant operating systems (see here). If you want to actually chdir() in the current process, though, you need to have it built into the shell. – Chris Down Jan 10 '14 at 08:30
  • it has become a habit why ls is external, but it can be also implemented in a shell. See busybox. –  Jan 11 '14 at 23:30
  • @bersch Well, almost anything can be implemented as part of a shell. Busybox is niche -- it is designed for embedded systems where it actually makes sense to do that. – Chris Down Jan 13 '14 at 02:37
  • Agree in case of embedded, but it is not niche. Just think how many routers world wide are running with it. –  Jan 13 '14 at 14:05
  • @bersch I would consider routers a niche market -- not to say that it's not popular, just that it is a niche. – Chris Down Jan 14 '14 at 02:47
14

The Bash Reference Manual states:

Builtin commands are necessary to implement functionality impossible or inconvenient to obtain with separate utilities.

That is, shells are designed to only include built-in commands if:

  1. Required by the POSIX standard
  2. Commands that require access to the shell itself, such as job control built-ins
  3. Commands that are very simple, not OS dependent and increase execution efficiency when implemented as built-ins, such as printf

The ls command does not fit any of the above requirments.

However, here is no programming constraint that would prevent ls being implmented as a built-in, that is executing in the same process as the bash interpreter. The design reasons for commands not being implmented as shell built-ins are:

  1. The shell should be be separate from the filesystem - no built-in commands should depend on correct operation of any filesystem or peripheral devices
  2. A command that might be filesystem type or OS dependent should be a separate executable
  3. A command that you might want to pipe to or from should be a separate process
  4. A command that you might want to run in the background should be a separate executable
  5. A command that has a large number of possible parameters is better implemented in a separate executable
  6. Commands that should have the same output, regardless of which type of shell (bash, csh, tsh,...) invokes them should be stand-alone executables

Regarding the first reason - You want the shell to be as independent and resiliant as possible. You don't want the shell to get stuck on ls of an NFS mount that is "not responding still trying".

Regarding the second reason - In many instances you might want to use a shell for a system that uses Busybox or other filesystem that has a different ls implementation. Or even use the same shell source in OS's that have different ls implementations.

Regarding the third reason - For an expressions such as find . -type d | xargs ls -lad it would be difficult or impossible to implement the ls in the same process as the shell interpreter.

Regarding the fourth reason - Some ls commands can take a long time to complete. You might want the shell to continue on doing something else in the meantime.


Note: See this helpful post by Warren Young in response to a similar question.

  • You missed the ease of piping output if its a seperate command, and all the programming it would take to pipe a shell primitive into a seperate executable. –  Jan 10 '14 at 13:27
  • @BruceEdiger: What a pleasure to receive a comment from the esteemed BE. Thanks! I believe that reason 3 covers your comment, no? – Jonathan Ben-Avraham Jan 10 '14 at 13:30
  • 1
    I was thinking more along the lines of how complicated the source code of the shell itself would be if it had to handle pipes for external processes, and pipe the output of the an internal command like the hypothetical ls into an external process. It could be done, but it would be complicated. –  Jan 10 '14 at 17:06
  • 1
    I'm afraid most if not all of your 5 points are moot. 1: ls is (hopefully) independent from the file system implementation. That's up to the kernel to provide a consistent interface to the standard library and applications. 2: ls is likely less dependent to the OS than the shell. 3: shells definitely allows builtins in pipelines. 4: shells definitely allow builtins to be run in the background. 5: that's quite subjective. – jlliagre Jan 10 '14 at 17:32
  • 1
    @JonathanBen-Avraham @BruceEdiger Don't shells already handle the pipe case for builtins with subshells? e.g bash output alias | grep ls. input cat /etc/passwd | while read a; do echo "$a"; done – Matt Jan 11 '14 at 16:15
  • @mindthemonkey: You are correct. My intent was to say that in the general case, i.e. expressions such as find . | xargs ls | grep Dave, pipe-able functions would be more complex to implement in-process. – Jonathan Ben-Avraham Jan 11 '14 at 16:56
  • @mindthemonkey - yes, modern shells do that. Recall that sh and ls were designed for a Unix running on a PDP-11 with 64 Kilobytes of instruction address space. That sort of limitation encourages designers to make separate processes for everything. Note that test (often used via a symlink [) was a separate process until modern shells came along. –  Jan 11 '14 at 17:17
  • @JonathanBen-Avraham that specific example of xargs (or anything) trying to exec a file is a useful argument against shell built ins as the kernel/loader doesn't know that a shell is meant to handle them (ls in your example). The pipe is handled though. This is inherently not in process as it's IPC. Shell's don't try and fake the IPC, they talk to a child process. The builtin is still a little quicker to launch in the child as it doesn't need to do a full execve as for an external binary. – Matt Jan 11 '14 at 19:05
2

ls does not require a separate process. Very few commands actually require a separate process: only the ones that need to change privileges.

As a rule, shells implement commands as builtins only when those commands need to be implemented as builtins. Commands like alias, cd, exit, export, jobs, … need to read or modify some internal state of the shell, and therefore cannot be separate programs. Commands that have no such requirements can be separate commands; this way, they can be called from any shell or other program.

Looking at the list of builtins in bash, only the following builtins could be implemented as separate commands. For some of them, there would be a slight loss of functionality.

  • command — but it would lose its usefulness in situations where PATH may not be set up properly and the script is using command as part of setting it up.
  • echo — it's a builtin for efficiency.
  • help — it could use a separate database, but embedding the help text in the shell executable has the advantage of making the shell executable self-contained.
  • kill — there are two advantages in having a builtin: it can recognize job designations in addition to process IDs, and it can be used even when there are not enough resources to start a separate process.
  • printf — for the same reason as echo, and also to support the -v option to put the output in a variable.
  • pwd — the builtin offers the additional capability of logical current directory tracking (leaving symbolic links intact instead of expanding them).
  • test — it's a builtin for efficiency (and bash also does some magic with files called /dev/fd/… on some operating systems).

A few shells offer a significant number of additional builtins. There's sash, which is a shell designed to be a standalone binary for emergency repairs (when some external commands may not be usable). It has a built-in ls, called -ls, as well as other tools such as -grep and -tar. Sash's builtins have fewer capabilities than the full-fledged commands. Zsh offers some similar builtins in its zsh/files module. It doesn't have ls, but wildcard expansion (echo *) and zstat can serve a similar function.

1

cd is built into the shell, ls is a separate program which you will see at /bin/ls.

DopeGhoti
  • 76,081
1

I think that something people are missing here is the shear complexity of the GNU ls program on Linux. Comparing the executable size of ls to the bash and dash shells on my Debian system, we see that it is quite large:

graeme@graeme:~$ ls -lh /bin/{ls,bash,dash}
-rwxr-xr-x 1 root root 953K Mar 30  2013 /bin/bash
-rwxr-xr-x 1 root root 115K Dec 25 20:25 /bin/dash
-rwxr-xr-x 1 root root 108K Jul 20 22:52 /bin/ls

Including an ls as full featured as the GNU version in bash would increase the executable size by 10%. It is almost the same size as the full dash shell!

Most shell builtins are chosen because they integrate with the shell in a way that external executables can't (the question points out cd, but another example is the bash version of kill integrating with bash job control) or because they are very simple commands to implement, giving a large speed vs size payoff (true and false are about as simple as it gets).

GNU ls has had a long development cycle and implements may options to customize what/how results are displayed. Using a builtin ls by default would either lose this functionality or significantly increase shell complexity and size.

Graeme
  • 34,027
0

This do what you are looking for:

printf "%s\n" *

Also you can store filenames in array:

files=(`printf "%s\n" *`)  #items are separated by whitespace
echo ${#files[*]} files
for index in ${!a[*]}
do printf "%d: %s\n" $index ${a[$index]};
done

But it doesn’t care about spaces in names
This passes to variable and do cares about spaces:

printf "%s\n" * | while read a; do echo $a; done
Costa
  • 552