Does the shell creating a subshell require the () Groups command?

Question

In my book (Sobell's A Practical Guide to Linux, 4e) it is written that

You can use the parentheses control operator to group commands. When you use this technique, the shell creates a copy of itself, called a subshell, for each group. It treats each group of commands as a list and creates a new process to execute each command...

I don't want to interpret this incorrectly so I figured I'd ask here. Does the creation of a subshell necessarily require the use of these () Groups commands, or is this just a way of ensuring that certain commands run in the same subsell?

Let me perhaps ask by example. Suppose I have commands (executables in PATH) a and b. Is there any difference between the following being entered at the command prompt?

a ; b
(a ; b)
(a) ; (b)

Understood! So, especially with regard to the difference between 1) and 3), I am correct to conclude that () is the (only?) way to create a new subshell/environment for the commands to run in. It is not enough to separate them in a list. @GillesQuénot — EE18, Feb 02 '24 at 20:32
You can also use $(command) to create a subshell from this command substitution. Another 'subshell' is if you do: sh -c command — Gilles Quénot, Feb 02 '24 at 20:35
Perhaps I have not yet encountered these yet, so thank you for mentioning those! @GillesQuénot One last follow-up if possible: simply using a;b does not create a new shell as we've just discussed, though it does create a couple new processes. Thus is the creation of a new shell/subshell purely relevant as it pertains to the environment in which a given process executes? I don't know too much about Linux/Unix internals, but presumably the virtual "space" for a given process also includes some data from the environment from which it was called and that is where the shell-dependence comes from? — EE18, Feb 02 '24 at 20:40
@GillesQuénot, sh -c ... is not a subshell, but an entirely separate and independent shell. A subshell is a copy of the shell's execution environment, but launching another shell makes no copy. E.g. foo=abc; sh -c 'echo $foo' prints nothing since foo isn't set in the inner shell (assuming it wasn't exported earlier, which is usually wouldn't be). With a subshell, the value of foo be visible within the subshell too. — ilkkachu, Feb 02 '24 at 20:48
@GillesQuénot you’re exporting the variable; ilkkachu’s example doesn’t. — Stephen Kitt, Feb 02 '24 at 22:24

ilkkachu · Accepted Answer · 2024-02-05T11:57:13.550

4

Let a equal foo=xyz, and b equal echo $foo. Or rather, let's just define those as functions:

a() { foo=xyz; }
b() { echo $foo; }

Then, let's try each variant you show, and in each case, initialize foo to abc first, and print the value of foo at the end. Outputs on the right hand side:

foo=abc; a ; b; echo $foo => xyz, xyz
foo=abc; (a ; b); echo $foo => xyz, abc
foo=abc; (a) ; (b); echo $foo => abc, abc

So, in the first one, the assignment happens at the main level and so is visible to the rest of the script. (The functions use the { ..; } grouping construct, so they run in the main shell.) In the second, the assignment happens in the same subshell as the first printout, but doesn't affect the rest of the script. And in the third, the assignment happens in the first subshell, and is only visible there, not later in the script.

Then again, you asked about executables in PATH, and since those can't affect the shell's execution environment anyway, it doesn't matter if they're run in subshells or not. That is, ls is the same as (ls). But with shell builtins the difference matters. Consider e.g. read (which sets variables), or exit (which exits the (sub)shell).

Command substitutions also run in subshells, so e.g.

foo=abc
echo $(foo=xyz; echo $foo)
echo $foo

prints xyz and abc.

But of course the command substitution syntax also uses parenthesis, so there's some symmetry. (Then again, (( ... )) is something entirely different.)

Anything that runs commands asynchronously or concurrently with the shell also necessarily starts a subshell, since doing that requires spawning a new process which can't modify the main shell process.

A common case of that is the pipeline. In foo | bar | doo, both foo and bar run in subshells, and doo may run in a subshell or it may run in the main shell environment.

E.g.

foo=abc
{ foo=xyz; echo $foo; } | cat
echo $foo

prints xyz, abc.

See: Why is my variable local in one 'while read' loop, but not in another seemingly similar loop?

Obviously, explicitly running something in the background with foo &, or with process substitutions (<( foo )), or other such also does start a subshell.

Anyway, the ( .. ) is the one that explicitly starts a subshell for the sake of starting one. With the others, one could say it's a sort of a side-effect.

edited Feb 05 '24 at 11:57

answered Feb 02 '24 at 20:45

ilkkachu

138,973

You also get subshells in process substitutions (<(...), >(...), =(...) ) and for asynchronous commands obviously (cmd & or {...;} &) or coprocs or more generally any time a command has to run in a separate process because it needs to run concurrently to something else. (...) is the only case where there's no such need and where one explicitly requests a subshell for the sake of having a separate execution environment (strictly speaking though, command substitution can sometimes be achieved and in some shells is achieved without forking a process) – Stéphane Chazelas Feb 02 '24 at 20:56
@StéphaneChazelas, yes indeed. Though I suppose one could argue that command substitutions don't need to run concurrently with anything (as the main shell has to wait for them anyway), so they don't need a separate environment either. Not sure if that would cause other issue, though. – ilkkachu Feb 02 '24 at 21:05
The output still needs to be retrieved in some way and traditionally, that's done through a pipe which is an inter-process communication mechanism. ksh93 can do without for the output of its builtins and doesn't fork but still simulates a subshell environment (not for the ${cmd; } variant). Some other shells such as FreeBSD sh can also do it for some simple invocations of some of its builtins such as $(printf ...). – Stéphane Chazelas Feb 02 '24 at 21:09
Thank you so much for this very nice answer. I have waited a few days to respond as I wanted to (and now have) read more from my book. I find myself still confused. At the bottom of page 333 (in the book in my OP) the text seems to say that every command (unless it's a built-in) causes the shell to fork a new process (with that process being, in particular) a subshell. The new subshell then executes the given command. Is this incorrect? Because if it is correct it would seem to suggest that whether I use a or (a), that command is being performed in a new subshell and, in that case,... – EE18 Feb 04 '24 at 21:34
... it's not clear to me why in a;b we have that b knows anything about the change in variable value of foo done by a. Including an @ to @StéphaneChazelas here as well. Thank you both for your help. – EE18 Feb 04 '24 at 21:34
@EE18 if both a and b are executables, then yes, b won't know anything about a change in value of a shell variable by a. And execution of both will involve a fork and exec. It's only if a isn't an external command, but like in the answer, a function, or a simple variable assignment, or a compound command that contains a variable assignment, etc., would it impact b – muru Feb 05 '24 at 00:43
Ah I see, I completely blanked that this was a function. Thanks for pointing that out @muru I guess in that case we have to use the () Group command in order to fork a new process/shell when doing the argument with functions as in this very nice answer from ilkkachu. – EE18 Feb 05 '24 at 03:02
@EE18, you could also define the function as foo() ( a=123; echo $a; ), i.e. with parenthesis instead of braces around the function body, that would make all invocations of the function run in subshells. (The body of a function is just a compound command, so (...) works to start a subshell there, too, same as usual. Also, the function body can be any compound command, so you can do foo() if true; then echo hello; fi entirely without braces or parens. Many shells would also accept a simple command, like foo() echo foo, but for some reason Bash doesn't.) – ilkkachu Feb 05 '24 at 12:08
@EE18, also, hmm, you mentioned subshells in context with starting an external program. It's true that the fork+exec model means there's going to be a copy of the shell, but it seems to me that the phrase "subshell" isn't usually used in that situation. The POSIX text says "Utilities other than the special built-ins shall be invoked in a separate environment..." but doesn't use the word "subshell", instead saving it for later where it explicitly lists groups with parens, command substitutions etc. I'm not sure if there's a consensus for a distinction here, though, it might be just me. – ilkkachu Feb 05 '24 at 12:20
1

(And the page I'm referencing is of course this: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_12 ) – ilkkachu Feb 05 '24 at 12:20
@ilkkachu Beat me to it. Just to add that in general, computer-related books tend to be more outdated than online documentation (manpages, info pages, READMEs etc) and prone to the author's subjectivity and level of knowledge of the subject matter. When it comes to the shell and standard C library, most follow the POSIX standard, so it is a significant first source of information to check. Then, of course, comes reading the source code, as the only 100% reliable source of information about the behavior of a specific program. – Vilinkameni Feb 05 '24 at 13:15

score 1 · Answer 2 · answered Feb 02 '24 at 21:02

These three lines do different things.

[me@here foo]$ echo A-$BASHPID ; echo B-$BASHPID
A-534171
B-534171
[me@here foo]$ (echo A-$BASHPID ; echo B-$BASHPID)
A-534798
B-534798
[me@here foo]$ (echo A-$BASHPID) ; (echo B-$BASHPID)
A-534808
B-534809

The first execute a and then b from the context of the current shell (pid=534171 in this example). The second creates a new subshell (pid=534798) and then executes a and then b in that new subshell. The third creates a new subshell (pid=534808) and executes a in it. After that subshell exits, then it create another subshell (pid=534809) and execute b in that subshell. These subshells generally do not inherit their entire environment from the original shell (for example unexported shell variables, file descriptors, 'ERR, DEBUG, RETURN trap handling are special cases), and some parts of the subshell environment are explicitly changed (process IDSs, shell history ...). It is also not generally possible for commands executing in a subshell to modify the parent shell environment.

[me@here foo]$ A=foo
[me@here foo]$ A=bar
[me@here foo]$ (A=baz)
[me@here foo]$ echo $A
bar

Even a binary executable started from these subshells could detect the differences and behave differently.

"These subshells generally do not inherit their entire environment..." -- hmm, I think they mostly do inherit everything (the POSIX spec says "A subshell environment shall be created as a duplicate of the shell environment, except that signal traps that are not being ignored shall be set to the default action."). Something like the PID of the interpreter is also something that necessarily has to change, too (due to the way subshells are usually implemented), but then $BASHPID is a bit special, and the standard $$ always shows the main shell's PID. — ilkkachu, Feb 05 '24 at 12:13

Does the shell creating a subshell require the () Groups command?

2 Answers2