2

I am wondering why

ls -1 | 
while read file; do 
     echo $file; tail -n 100 $file > >(sleep 1 && cat > $file)
done  

is faster than

ls -1 | 
while read file; do 
    echo $file; tail -n 100 $file | (sleep 1 && cat > $file)
done   

?

If there are 100 files in a directory then:

  • the second command takes almost 100 seconds to process
  • the first command is processed almost immediately.
agc
  • 7,223
haccks
  • 203
  • 4
  • 10
  • 1
    That's because bash doesn't wait for the termination of that process substitution (which could be seen as a bug). See The process substitution output is out of the order for details. – Stéphane Chazelas Mar 29 '18 at 13:45
  • @StéphaneChazelas; This says different : https://unix.stackexchange.com/q/146263/186463 – haccks Mar 29 '18 at 14:10
  • @haccks, in that case grep < <(cmd), while the shell doesn't wait for cmd, grep does as it waits for eof on its stdin. Again, see details at the link I gave in my previous comment. – Stéphane Chazelas Mar 29 '18 at 14:22
  • @StéphaneChazelas; you said "bash doesn't wait for the termination of that process substitution". Isn't that true for pipes too? – haccks Mar 29 '18 at 14:54
  • @haccks, no. bash waits for all pipe components. All shells wait for the last (right-most) pipe components, some don't wait for the other ones. – Stéphane Chazelas Mar 29 '18 at 15:04
  • @StéphaneChazelas; Pipe components run in the background (in subshell) as well as process substitution. Then why is this difference? Also in cmd1 | cmd2, does bash will wait for cmd1 too? – haccks Mar 29 '18 at 15:06
  • 2
    All pipe components run in subshells in bash, but not in background unless the whole pipeline is put in background with & (note that background/foreground only apply to interactive shells, that's terminology linked to job control in terminals). process substitution are put in background unless they're run from a subshell itself run in foreground. In any case, that's not what I'm talking about. I'm talking about the shell waiting for the termination of processes it starts before continuing with the next command (tbc) – Stéphane Chazelas Mar 29 '18 at 15:37
  • (continued). In cmd1 | cmd2; cmd3, bash waits for the termination of cmd1 and cmd2 (the processes it started to execute them) before running cmd3. In cmd1 > >(cmd2); cmd3, bash only waits for cmd1 before running cmd3. cmd2 may very well continue running while cmd3 is running. Try echo foo | (sleep 1; cat); ps vs echo foo > >(sleep 1; cat); ps – Stéphane Chazelas Mar 29 '18 at 15:39
  • @StéphaneChazelas can you please give me a reference where it says that process substitution run in background or the command in the process substitution is not waited for? – haccks Mar 29 '18 at 16:17
  • @haccks, it's easy to verify (like with the examples I gave above). For foreground vs background, use ps -j. See also the links I gave including the discussion on the bash mailing list. – Stéphane Chazelas Mar 29 '18 at 16:23
  • @StéphaneChazelas I already tested. Just curious to know where the docs says it. Language laywer problem :) – haccks Mar 29 '18 at 16:25
  • See info bash "Process Substitution". They key word is asynchronously. – Stéphane Chazelas Mar 29 '18 at 16:28
  • @StéphaneChazelas that's the confusion point. Pipes components also run asynchronously. – haccks Mar 29 '18 at 16:33
  • @haccks From the Pipelines documentation: If the pipeline is not executed asynchronously (see Lists), the shell waits for all commands in the pipeline to complete. – Barmar Mar 30 '18 at 00:28
  • 1
    So in both cases the components run asynchronously, but in the case of pipelines it still waits for them all to finish (unless you run the whole pipeline in the background with &). – Barmar Mar 30 '18 at 00:29

1 Answers1

2

When you use a pipeline, the shell runs each command in the pipeline concurrently, and waits for all of them to finish before going to the next command. This is explained in the documentation:

If the pipeline is not executed asynchronously (see Lists), the shell waits for all commands in the pipeline to complete.

When the above refers to executing the pipeline asynchronously, it's talking about running the whole pipeline in the background with &.

When you use process substitution the shell doesn't wait for it to complete. The documentation simply says:

The process list is run asynchronously, and its input or output appears as a filename.

Barmar
  • 9,927
  • I think I was confused with the term "asynchronous" and "concurrent". In case of pipe, elements run concurrently. But here I think "asynchronous" is specifically mean running process in the background, correct me if I am wrong? – haccks Mar 31 '18 at 12:06
  • That seems to be how the manual is using the term. – Barmar Mar 31 '18 at 14:37