-1

I seek a simple explanation about the term "process substitution" aimed for general audience / non professional sysadmins, which do server environment maintenance for personal projects once in a while and not daily for customers or companies, if its okay to ask for it here.

  • What problem/s this concept was invented to solve
  • What process is being substituted and with what (the name might mislead)?
  • Is it also named "preserving stdin" and if so why? If the stdin device holds standard input which can be piped to another command, what is there to "preserve" further than that?

This might help me to understand the following command:

bash <(wget -O - URL)

Note for newcomers: -O - tells wget to write data to stdout;

  • see https://tldp.org/LDP/abs/html/process-sub.html – pLumo Mar 20 '21 at 12:17
  • @pLumo I understand that it is the piping of two or more commands (instead just one as in regular piping) to a command. – variable_expander Mar 20 '21 at 12:36
  • 1
    @ilkkachu these are the contexts: https://unix.stackexchange.com/questions/640180/executing-a-remote-script-from-a-code-repository-causes-endless-loop and https://unix.stackexchange.com/questions/17107/process-substitution-and-pipe/27346#27346 – variable_expander Mar 20 '21 at 12:38
  • @ilkkachu I found the GNU explanation to be unclear/inaccessible for general public of users (as described in the question) ; the tag info doesn't mention why to do what it describes (allows the input or output of a command to appear as a file --- shouldn't this be done with variable substation and piping to a file?); I think I still miss what's "being substituted". – variable_expander Mar 20 '21 at 12:42
  • @ilkkachu I don't have such directories with files and I prefer not to create them (if there was a dummy content creator in that context I would do that); is there an even simpler explanation of this operation with files existing in generally any Linux system? – variable_expander Mar 20 '21 at 13:26
  • @variable_expander You have posted few questions and gained some reputation. I encourage you to register your account. – Kamil Maciorowski Mar 20 '21 at 14:19
  • @ilkkachu what you wrote about me is wrong; I have made many software tests in my life, would probably do much more and often even enjoy making them, the test you have suggested is ambiguous: Two directories with how many files each by minimum? Special naming needed? Still, which "process" is being "substituted"? I just don't want to guess. I have even asked you about a test which doesn't include creating directories and files. I will gladly do testing if at least I know what approximately to expect. I want not to run commands when I don't have a minimal understanding of what to expect. – variable_expander Mar 20 '21 at 14:23
  • And BTW @ilkkachu I have done much more complicated software testing in my life then two mkdirs and about 2x2 touches, in all humbleness. – variable_expander Mar 20 '21 at 14:51
  • @ilkkachu with all niceness and (vast) appreciation to you it didn't help me, or at least not a lot :) Anyway, the chapter Important fact: there are two shells involved in Kamil Maciorowski's answer basically explained the concept for me; perhaps only Brian Fox knows what was the original problem it was meant to solve originally. – variable_expander Mar 21 '21 at 13:22

1 Answers1

1

Important fact: there are two shells involved

First note there are two shells involved. There is the outer Bash you work in, you invoke (type) bash <(wget -O - URL) in it. And there is the inner Bash that runs as a child process of the outer Bash when the command gets executed.

In other words the outer Bash accepts bash <(wget -O - URL) as a command and it spawns the inner Bash. I will distinguish the two where appropriate.


Important fact: there is more than one way to run code with Bash

There are few ways to make (an inner) Bash run some code:

  1. bash can read code from its stdin:

    echo 'date; sleep 2; date' | bash
    

    The stdin can be the console. E.g. an interactive bash you normally work in reads from the console but it actually reads from its stdin because the console is its stdin.

  2. bash can read code from a file:

    bash /path/to/some/file
    

    (The path may be relative.)

  3. bash can read code from a command-line argument:

    bash -c 'date; sleep 2; date'
    

The context

In a comment you said the context is this question. My answer there advised to replace:

wget -O - URL | bash

with

bash <(wget -O - URL)

The first command makes the inner bash read from its stdin. What you seem not to have realized is that the second command makes the inner bash read from a file that is not its stdin.


What happens to <(…)

When you run bash <(wget -O - URL) in Bash, the outer Bash replaces <(…) with the path to some file. After the replacement the actual command being run may be like:

bash /dev/fd/63

This spawns the inner Bash. It opens /dev/fd/63 like it would open /path/to/some/file.

Thanks to the "magic" of process substitution /dev/fd/63 is a pipe already connected to wget -O - URL.

Side note: on a system where /dev/fd is unavailable the outer Bash would use a truly named pipe (e.g. /tmp/sh-np.pldKay) to set this up.


The similarity

The connection from wget to (the inner) bash is similar in the two cases. When you run wget -O - URL | bash, bash reads code from its stdin which is a pipe wget writes to. When you run bash <(wget -O - URL), bash reads code from another file (i.e. not from its stdin) which is a pipe wget writes to. In both cases the outer Bash sets the pipe up.


The difference

The difference is with process substitution the stdin of the inner Bash is not used to pass code to it. The stdin can be used for another purpose. E.g. the stdin can be the console and read in the code can read from it.


Answers to your explicit questions

  • What problem/s this concept was invented to solve

    When a command expects a pathname (path to a file) and you want to pipe something in instead of providing a regular file, you can create a named pipe and achieve the goal. This requires actually creating a pipe (mkfifo), piping something to it (in the background or in another console), running the command and finally removing the named pipe (rm).

    With process substitution Bash handles the piping. It's convenient.

  • What process is being substituted and with what?

    In bash <(wget -O - URL) the <(wget -O - URL) syntax is substituted with a path to some file, only then the command is executed (and it's bash /dev/fd/63 or similar). The outer Bash prepares the file, so reading from it means reading what wget -O - URL (the process) writes. The file is actually a pipe, not a regular file.

  • Is it also named "preserving stdin" and if so why?

    "Preserving stdin" happens, but it's not an alternative name for process substitution. It's not a formal name for what happens either.

    When you run (inner) bash … in (outer) Bash, the inner Bash gets its stdin inherited from the outer Bash (i.e. it's the same file). The stdin may be the console.

    But if you run wget … | bash … instead, the inner Bash will see data from wget on its own stdin. Now some part of the inner Bash (e.g. the read builtin) or some child of the inner Bash may want to read something from stdin. They expect input from a console or whatever, but not the code the inner Bash should execute. But because the stdin of the inner Bash is a pipe from wget, builtins will use it and child processes will inherit it as their stdins. They won't inherit the stdin of the outer Bash.

    By using process substitution instead of piping code via the stdin of (the inner) bash, you make the stdin of the outer Bash available for builtins and children of the inner Bash. You preserve the stdin of the outer Bash for builtins and children of the inner Bash. At the same time you protect the stream of code the inner Bash reads and executes from being read by its builtins or children.

    This works only because bash allows you to provide code in a file specified as a command line argument. If (the inner) Bash supported reading code from stdin only, then you would have to provide code via its stdin. In such case process substitution could not solve the issue you had.

  • If I'd known you were writing an answer, I wouldn't have closed it. Sorry. – ilkkachu Mar 20 '21 at 14:04
  • 1
    @ilkkachu No harm done. If I knew valuable posts on U&L better, I may have found a duplicate by myself and voted to close. I actually noticed the closure but tried my luck anyway. :) I did this because I think the information about how Bash can read code plus the distinction between the outer Bash and the inner Bash may help the OP better than examples where Bash runs some other commands. – Kamil Maciorowski Mar 20 '21 at 14:13
  • @KamilMaciorowski please consider to shorten the answer a bit; I think I don't have the mental ability to handle such a long text in English (not my native language) about advanced subjects in *nix (at least for me), especially when it's not splitted to chapters. I tried to edit to add a chapter and a "stop, wait and think about it" that said; divider but I am still having hard time understanding what's going on, especially in the last three passages where I felt mentally exhausted. Of course, the problem here is from my side, not of yours; your try to help me is venerated. – variable_expander Mar 20 '21 at 14:47
  • Link to suggested edit in case community members reject it from whatever reason: https://unix.stackexchange.com/review/suggested-edits/371386 – variable_expander Mar 20 '21 at 14:48