6

If you want to read the single line output of a system command into Bash shell variables, you have at least two options, as in the examples below:

  1. IFS=: read user x1 uid gid x2 home shell <<<$(grep :root: /etc/passwd | head -n1)

and

  1. IFS=: read user x1 uid gid x2 home shell < <(grep :root: /etc/passwd | head -n1)

Is there any difference between these two? What is more efficient or recommended?


Please note that, reading the /etc/passwd file is just for making an example. The focus of my question is on here strings vs. process substitution.

FedKad
  • 610
  • 4
  • 17
  • 3
    Instead of grepping the password file you might want to use ˋgetent passwd rootˋ instead. It works for multiple user sources and might use internal,cache and lookup index. – eckes Jun 13 '21 at 17:29

2 Answers2

16

First note that using read without -r is to process input where \ is used to escape the field or line delimiters which is not the case of /etc/passwd. It's very rare that you would want to use read without -r.

Now as to those two forms, a note that neither are standard sh syntax. <<< is from zsh in 1991. <(...) is from ksh circa 1985 though ksh initially didn't support redirecting from/to it.

$(...) is also from ksh, but has been standardised by POSIX (as it replaces the ill-designed `...` from the Bourne shell), so is portable across sh implementations these days.

$(code) interprets the code in a subshell with the output redirected to a pipe while the parent at the same time, reads that output from the other end of the pipe and stores it in memory. Then once that command finishes, that output, stripped of the trailing newline characters (and with the NUL characters removed in bash) makes up the expansion of $(...).

If that $(...) is not quoted and is in list context, it is subject to split+glob (split only in zsh). After <<<, it's not a list context, but still older versions of bash would still do the split part (not glob) and then join the parts with spaces. So if using bash, you'd likely want to also quote $(...) when used as target of <<<.

cmd <<< word in zsh and older versions of bash causes the shell to store word followed by a newline character into a temporary file, which is then made the stdin of the process that will execute cmd, and that tempfile deleted before cmd is executed. That's the same as happens with << EOF from the Bourne shell from the 70s. Effectively, it is exactly the same as:

cmd << EOF
word
EOF

In 5.1, bash switched from using a temporary file to using a pipe as long as the word can fit whole in the pipe buffer (and falls back to using a tempfile if not to avoid deadlocks) and makes cmd's stdin the reading end of the pipe which the shell has seeded beforehand with the word.

So cmd1 <<< "$(cmd2)" involves one or two pipes, store the whole output of cmd2 in memory, storing it again in either another pipe or a tempfile and mangles the NULs and newlines.

cmd1 < <(cmd2) is functionality equivalent to cmd2 | cmd1. cmd2's output is connected to the writing end of a pipe. Then <(...) expands to a path that identifies the other end, < that-path gets you a file descriptor to that other end. So cmd2 talks directly to cmd1 without the shell doing anything with the data.

You see this kind of construct in the bash shell specifically because in bash, contrary to AT&T ksh or zsh, in:

cmd2 | cmd1

cmd1 is run in a subshell¹, so if cmd1 is read for instance, read will only populate variables of that subshell.

So here, you would want:

IFS=: read -r user x1 uid gid x2 home shell rest_if_any_ignored < <(
  grep :root: /etc/passwd)

The head is superfluous as with -r, read will only read one line anyway². I've added a rest_if_any_ignored for future proofing in case in the future a new field is added to /etc/passwd, causing $shell to contain /bin/sh:that-field otherwise.

Portably (in sh), you can't do:

grep :root: /etc/passwd |
  IFS=: read -r user x1 uid gid x2 home shell rest_if_any_ignored 

as POSIX leaves it unspecified whether read runs in a subshell (like in bash/dash...) or not (like zsh/ksh).

You can however do:

IFS=: read -r user x1 uid gid x2 home shell rest_if_any_ignored << EOF
$(grep :root: /etc/passwd | head -n1)
EOF

(here restoring the head to avoid the whole of grep's output to be stored in memory and in the tempfile/pipe).

Which is standard even if not as efficient (though as indicated by @muru, the difference for such a small input is likely negligible compared to the cost of running an external utility in a forked process).

Performance, if that mattered here, could be improved by using builtin features of the shell to do grep's job. However, especially in bash, you'd only do that for very small input as a shell is not designed for this kind of task and is going to be a lot worse at it than grep.

while
  IFS=: read <&3 -r user x1 uid gid name home shell rest_if_any_ignored
do
  if [ "$name" = root ]; then
    do-something-with "$user" "$home"...
    break
  fi
done 3< /etc/passwd

¹ except when the lastpipe option in bash is set and the shell is non-interactive like in scripts

² see also the -m1 or --max-count=1 option of the GNU implementation of grep which would tell grep itself to stop searching after the first match. Or the portable equivalent: sed '/:root:/!d;q'

  • Thanks for the "historical" explanation. I am interested only in Bash 5.1 or above though. In my case, Bash seems to behave the same in cmd1 <<<$(cmd2) and cmd1 <<<"$(cmd2)". Also the rest_if_any_ignored tip was something that I already know, but ignored for simplicity. The fact that read will read only one line and the -r switch of it which is almost always necessary was a useful extra tip. I will accept this answer instead of @muru's (sorry @muru) because, *although it comes after @muru's*, it is somehow more detailed. – FedKad Jun 13 '21 at 09:09
  • Can you comment on the usage of file descriptor 3 in the last example? – FedKad Jun 13 '21 at 09:52
  • @FedonKadifeli, that's to avoid clobering 0, 1, 2 which are stdin/stdout/stderr and that commands in the loop may want to access. See Why is using a shell loop to process text considered bad practice? (thanks for edit btw). – Stéphane Chazelas Jun 13 '21 at 09:54
6

In herestrings, bash reads in the entire output of the command substitution ($(grep :root: /etc/passwd | head -n1)) to create the contents of the herestring, and then you tell it to read it again with read.

On the other hand, with process substitution, bash sets up a pipe, and then you read in the output once.


You're running bash (and two other external commands) to read in one line. At that point, efficiency has long since been defenestrated.


While we're at it, GNU grep has a -m option:

-m num
--max-count=num

Stop after the first num selected lines.

muru
  • 72,889
  • So, "process substitution" may be the (slightly) preferred method, if I understand correctly. Thanks also for the extra grep tip! :) – FedKad Jun 13 '21 at 08:51