Is a temporary environment variable added before or after the shell creates a child process which will execute a command?

Question

In bash, when running

myvar=val mycommand myargs

myvar=val will be added to the environment for executing mycommand.

Suppose the the bash process calls fork() to create a child process which will executes mycommand, i.e. mycommand is an external executable file or a script file.

When does adding myvar=val to the environment happen, before or after the bash shell calls fork()? In other words, which of the following two possibilities actually happens?

the bash process adds myvar=val to its own environment, then calls fork() to create a child process which will calls execve() to execute mycommand, and myvar=val as part of the environment of the bash process is inherited into the environment of the child process. Upon finishing the execution of mycommand and exiting of the child process, the bash process drops myvar=val from its own environment.
the bash process calls fork() to create a child process which will executes mycommand, and the child process adds myvar=val to its own environment and then calls execve() to execute mycommand.

My question is motivated from Stephen's reply to my earlier post.

In Bash, _ is a special parameter which is set to the value of the last argument every time a command is parsed. It also has the special property of not being exportable, which is enforced every time a command is executed (see bind_lastarg in the Bash source code).

I am wondering that when a bash process executes a command, if bash doesn't add _ to its own environment, why does it need to drop it from its own environment?

Thanks.

Bash may have inherited _ as an environment variable. Every time a command is executed, the "exported" flag on the variable is cleared. You would think that, once the bit has been cleared, there is no need to clear it again, but it is cheaper and simpler to just clear it instead of doing a test and clear. — Johan Myréen, Apr 12 '18 at 21:57
I tend to think there is no "own" bash environment. There's just a bunch of variables, each with the "export" flag set or clear. When an external command is run, bash allocates a new array and copies the exported variables into this array, which is passed as the envp parameter to the execve system call. The kernel then copies the array onto the child process's stack, next to the argument vector that argv points to. — Johan Myréen, Apr 12 '18 at 22:08

Stephen Kitt · Accepted Answer · 2018-04-12T21:07:21.947

5

The reality is somewhere between the two possibilities you describe. Bash doesn’t add myvar to its own environment, at least not the full shell environment as we usually think of it; it adds myvar to its temporary environment. It then builds the export environment, specifically for the new command, from the temporary environment, along with the current variable context, and the shell functions, before forking if necessary to run the child command. You can see this as calls to maybe_make_export_env in the Bash source code. The temporary environment is then cleaned up after the child is started; look for dispose_used_env_vars.

In practice this doesn’t make any difference. The child command gets the environment it’s supposed to receive, and the parent environment is at it should be too once you get control back; unless you’re going to make changes to Bash, that’s all that matters.

edited Apr 12 '18 at 21:07

answered Apr 12 '18 at 20:37

Stephen Kitt

434,908

1

Thanks. "The reality is somewhere between the two possibilities you describe" is a little ambiguous. Is the reality the same as the first possibility I described, and not the second possibility? – Tim Apr 12 '18 at 20:40
No, like I said it’s somewhere in between. The assignment never makes it to the shell environment, it’s limited to the temporary environment (AFAICT from a quick read). Of course in practice the effect is the same as your first possibility. – Stephen Kitt Apr 12 '18 at 20:42
I’ll try to clarify... – Stephen Kitt Apr 12 '18 at 20:43
In "The environment is then cleaned up after the child is started", do you mean the temporary environment instead of the the shell's environment, by "The environment"? – Tim Apr 12 '18 at 20:48
@Tim You are really looking into too much detail here, imho. Environment variables are reall just a parameter passing mechanism, similar to the argument vector argv. There are rules set by Bash and other standards that the environment variables are inherited by child processes, but how this is done is an implementation detail. There is no rigid data structure "The environment" that is floating somewhere in the background that is magically passed from process to process. – Johan Myréen Apr 12 '18 at 21:02
See my preivious answer: https://unix.stackexchange.com/questions/436603/is-there-a-bash-builtin-command-which-can-show-the-environment-variables-of-the/436631#436631 – Johan Myréen Apr 12 '18 at 21:03
@Johan: see my edit about why I asked the questions. – Tim Apr 12 '18 at 21:12
Stephen, Thanks. I really like the links you gave to the C implementation of bash. If I would like to read the C code, I was wondering how I can find out the definition of a function e.g. dispose_used_env_vars? Do I need to download the source code, and use grep, or can I search directly on the website? – Tim Apr 15 '18 at 13:34
I always prefer downloading source code to read it, either from the upstream repository, or from the Debian archives (apt source). That way I can use etags etc. to simplify navigation in Emacs. – Stephen Kitt Apr 15 '18 at 17:16
"The temporary environment is then cleaned up after the child is started", which seems a waste. I wonder why not let the child process to prepare its own environment instead of letting the original bash shell process prepares for it? Normally, when one writes a C program doing some similar thing, will they let the original process or the child process to prepare the environment of the child? – Tim Apr 15 '18 at 19:53
@Tim Like I said earlier, the process environment is really just a parameter passing mechanism. The parent process passes an environment to its child process. This environment can be equivalent to what has been passed to the parent by the parent's parent, or it can be different. The child cannot construct its own environment out of thin air, it has to come from somewhere. This "somewhere" is what is pointed to by the envp pointer in the execve system call. There is no other mechanism. – Johan Myréen Apr 16 '18 at 13:57
@Johan: when a process calls execve() to run a program, it doesn't create a new process. Before calling execve(), the process can prepare a new environment as argument given to execve(), instead of letting its parent process do so. Do you agree? – Tim Apr 16 '18 at 13:59
Ok, I see what you mean. It's the child process that does the exec, so it can construct the environment, although I am not very convinced the savings are very big. – Johan Myréen Apr 16 '18 at 14:22
@Tim the shell still has to be able to construct and clean up a temporary environment, because it might end up with no child to start (think of e.g. built-in commands). – Stephen Kitt Apr 16 '18 at 14:52

Is a temporary environment variable added before or after the shell creates a child process which will execute a command?

1 Answers1

Linked