Why does POSIX require certain shell built-ins to have an external implementation?

Question

From this question about whether printf is a built-in for yash, comes this answer that quotes the POSIX standard.

The answer points out that the POSIX search sequence is to find an external implementation of the desired command, and then, if the shell has implemented it as a built-in, run the built-in. (For built-ins that aren't special built-ins.)

Why does POSIX have this requirement for an external implementation to exist before allowing an internal implementation to be run?

It seems... arbitrary, so I am curious.

I believe that is a way to enable/disable builtins if desired/required. — , Jan 23 '19 at 22:43
Disabling the built-in by removing the external implementation? Now there are no commands of name printf available. — studog, Jan 23 '19 at 22:49
@studog, so create an empty file with the same name as the built-in, turn on the execute bit, and put it in a directory in your PATH. :P — Wildcard, Jan 23 '19 at 23:13
@Wildcard A strictly compliant shell would then see the name while searching the PATH and then call the built-in utility, not the external script. What if you'd want to call the external script in your path? Hmm... This seems to call for a table describing the different possibilities. There is one here, but it doesn't make sense to me. — Kusalananda, Jan 23 '19 at 23:23
@Kusalananda, re your first sentence, that was my point. Hence why I said to create an empty file. — Wildcard, Jan 23 '19 at 23:28
@Kusalananda Sadly I can not recommend to use builtin as the spec explicitly disables it (and yash can use it while in posix). But you have, at least, three options to call external utilities: (1) The widely known full path: /usr/bin/printf (2) Have and use an external which that correctly finds external utilities and execute $(which printf) and (3) the much recomended tool to call programs in the PATH: env printf. — , Jan 24 '19 at 02:36
@Kusalananda In the table, the first two make sense to me (and work) ls calls the shell own function (or built-in), command ls should call the external ls (but doesn't work in yash bug?). The following two are externally used commands, in a makefile and in Perl (maybe?). The last one looks incorrect to me. The PATH to search for a command name (in the parent shell) can not change by an assignment. — , Jan 24 '19 at 02:52
@studog That was the intent, that doesn't work in practice. Only a command like enable could do that reliably. — , Jan 24 '19 at 03:28

score 17 · Accepted Answer · answered Jan 24 '19 at 04:51

This is an "as if" rule.

Simply put: The behaviour of the shell as users see it should not change if an implementation decides to make a standard external command also available as shell built-in.

The contrast that I showed at https://unix.stackexchange.com/a/496291/5132 between the behaviours of (on the one hand) the PD Korn, MirBSD Korn, and Heirloom Bourne shells; (on the other hand) the Z, 93 Korn, Bourne Again, and Debian Almquist shells; and (on the gripping hand) the Watanabe shell highlights this.

For the shells that do not have printf as a built-in, removing /usr/bin from PATH makes an invocation of printf stop working. The POSIX conformant behaviour, exhibited by the Watanabe shell in its conformant mode, causes the same result. The behaviour of the shell that has a printf built-in is as if it were invoking an external command.

Whereas the behaviour of all of the non-conformant shells does not alter if /usr/bin is removed from PATH, and they do not behave as if they were invoking an external command.

What the standard is trying to guarantee to you is that shells can build-in all sorts of normally external commands (or implement them as its own shell functions), and you'll still get the same behaviour from the built-ins as you did with the external commands if you adjust PATH to stop the commands from being found. PATH remains your tool for selecting and controlling what commands you can invoke.

(As explained at https://unix.stackexchange.com/a/448799/5132, years ago people chose the personality of their Unix by changing what was on PATH.)

One might opine that making the command always work irrespective of whether it can be found on PATH is in fact the point of making normally external commands built-in. (It's why my nosh toolset just gained a built-in printenv command in version 1.38, in fact. Although this is not a shell.)

But the standard is giving you the guarantee that you'll see the same behaviour for regular external commands that are not on PATH from the shell as you will see from other non-shell programs invoking the execvpe() function, and the shell will not magically be able to run (apparently) ordinary external commands that other programs cannot find with the same PATH. Everything works self-consistently from the user's perspective, and PATH is the tool for controlling how it works.

The standard's rationale and its illustrating example suggest that this was a botched attempt to have a regular built-in associated with a path, and let the user override it by having their own binary appear before it in PATH (eg. a printf built-in associated with /usr/bin/printf could be overridden by the /foo/bin/printf external command by setting PATH=/foo/bin:$PATH).

However, the standard did not end up requiring that, but something completely different (and also useless and unexpected).

You can read more about it in this bug report. Quoting from from the final accepted text:

Many existing implementations execute a regular built-in without performing a PATH search. This behavior does not match the normative text, and it does not allow script authors to override regular built-in utilities via a specially crafted PATH. In addition, the rationale explains that the intention is to allow authors to override built-ins by modifying PATH, but this is not what the normative text says.

FWIW, I don't think there's any shell implementing the revised requirements from the accepted text, either.

See also the discussion at http://article.gmane.org/gmane.comp.standards.posix.austin.general/12525 (and there have been several others). — Stéphane Chazelas, Jan 24 '19 at 17:29

adam.hendry · Answer 3 · 2021-05-30T17:00:55.820

Follow-up vis-a-vis echo vs printf:

(Below, builtin means "special builtin", and "regular builtin"s are not considered to be builtins by me since they are not built into the shell)

The first POSIX standardization committee could not agree on how to standardize echo, so they compromised by issuing that if it was passed flags (-e,-n,-E, etc.) or if any arguments contained escape sequences (\n,\c,\t, etc.) that the behavior was to be defined by the implementing shell rather than POSIX. Instead, the printf command was added and given well-defined behavior. (source: Classic Shell Scripting, by Robbins and Beebe).

Although printf is well-defined, some shells do not have printf as a builtin command (e.g. mksh). Instead, they use printf from /usr/bin/. This meant all scripts run from that shell would print the same on a given operating system (Ubuntu, Fedora, etc.), but that they wouldn't necessarily print the same across OSs (in fact, many users changed the printf in their /usr/bin for this reason).

Alternatively, shells with printf as a builtin would print the same regardless of OS, but only if used as implemented for the shell. However, since printf behavior is defined by the POSIX standard, that isn't necessarily a concern for programmers. However, if PATH were overriden for shells that use printf from /usr/bin/, printf wouldn't be found.

Though all shells have echo as a builtin, some interpret escape sequences directly (e.g. ash) while others (most) require a -e flag: the behavior is not defined by POSIX, but by the shell.

One of the main annoyances of echo vs. printf is that echo prints new lines at the end of the string by default, but printf does not. printf requires the \n escape sequence to print new lines. Conversely, to prevent echo from printing a new line, the \c escape sequence is required (potentially, also requiring the -e flag).

printf is recommended for maximum portability since its behavior is defined by POSIX, but I personally find explicitly printing a new line at the end of each line is quite annoying (most lines I write require a new line at the end and I very rarely need to suppress echo's printing of new lines). On the other hand, echo is always available since it's a builtin (no risk of not being found on $PATH) and a simple check can be performed to determine whether the -e flag is needed and a corresponding aliased echo made:

#! /bin/sh -
Determine if "builtin" command exists.
BUILTIN='builtin'
if ! ("${BUILTIN}" echo 123 >/dev/null 2>&1); then
  BUILTIN=''
fi
export BUILTIN
ECHO='echo -e'
if ${BUILTIN} [ "echo -e test" = '-e test' ]; then
  ECHO='echo'
fi
export ECHO
Now use "${ECHO}" where you would normally use "echo"...

Personally, I prefer to do this and only use printf if I need special formatting.

UPDATE: I should give proper credit where credit is due. The shell code above was taken directly from shunit2. Credit goes to Kate Ward and the shunit2 development team for that one! (Well done ;) )

adam.hendry · Answer 4 · 2021-05-27T05:21:48.990

Adding this as well (Classic Shell Scripting by Robbins and Beebe is a great book):

The shell has a number of commands that are built-in. This means that the shell itself executes the command, instead of running an external program in a separate process. Furthermore, POSIX distinguishes between “special” built-ins and “regular” built-ins. [Most regular built-ins] have to be built-in for the shell to function correctly (e.g., read). Others are typically built into the shell only for efficiency (e.g., true and false). The standard allows other commands to be built-in for efficiency as well, but all regular built-ins must be accessible as separate programs that can be executed directly by other binary programs. The distinction between special and regular built-in commands comes into play when the shell searches for commands to execute. The command-search order is special built-ins first, then shell functions, then regular built-ins, and finally external commands found by searching the directories listed in $PATH. This search order makes it possible to define shell functions that extend or override regular shell builtins. This feature is used most often in interactive shells. For example, suppose you would like the shell's prompt to contain the last component of the current directory's pathname. The easiest way to make this happen is to have the shell change PS1 each time you change directories. You could just write your own [cd] function [for this]. There is one small fly in the ointment here. How does the shell function access the functionality of the "real" cd command?...What's needed is an "escape hatch" that tells the shell to bypas the search for functions and access the real command. This is the job of the command built-in command.

[However] the command command is not a special builtin command! Woe be to the shell programmer who defines a function named command! The POSIX standard provides the following two additional special qualities for the special built-in commands:

A syntax error in a special built-in utility may cause a shell executing that utility to abort, while a syntax error in a regular built-in utility shall not cause a shell executing that utility to abort. If a special built-in utility encountering a syntax error does not abort the shell, its exit value shall be nonzero.

Variable assignments specified with special built-in utilities remain in effect after the built-in completes; this shall not be the case with a regular built-in or other utility. [That is] you can specify variable assignment at the front of a command and the variable will have that value in the environment of the executed command only, without affecting the variable in the current shell or subsequent commands. (e.g. PATH=/bin:/usr/bin: awk '...') However, when such an assignment is used with a special built-in command, the assignment stays in effect from then on, even after the special built-in completes.

Arnold Robbins and Nelson H. F. Beebe. Classic Shell Scripting: Hidden Commands that Unlock the Power of Unix (p. 262-5). O'Reilly Media. Kindle Edition.

Note that the command command causes the shell to treat the specified command and arguments as a simple command, suppressing shell function lookup. From the IBM Docs

Normally, when a / (slash) does not precede a command (indicating a specific path), the shell locates a command by searching the following categories:

special shell built-ins shell functions regular shell built-ins PATH environment variable For example, if there is a function with the same name as a regular built-in, the system uses the function. The command command allows you to call a command that has the same name as a function and get the simple command.

The command -v and command -V commands write to standard output what path name will be used by the shell and how the shell interprets the command type (built-in, function, alias, and so forth). Since the -v and -V flags produce output in relation to the current shell environment, the command command is provided as a Korn shell or POSIX shell regular built-in command. The /usr/bin/command command might not produce correct results, because it is called in a subshell or separate command execution environment,. In the following example the shell is unable to identify aliases, subroutines, or special shell commands:

(PATH=foo command -v) nohup command -v

Thus, in my previous example, I used the bash builtin instead of command because had I put it in a subshell, it would not have worked properly.

I second @mosvy: it appears the standard and normative text don't match (quite absurd indeed).

Why does POSIX require certain shell built-ins to have an external implementation?

4 Answers4

This is an "as if" rule.

Further reading

Determine if "builtin" command exists.

Now use "${ECHO}" where you would normally use "echo"...

Linked