4

In reading through the source to fff to learn more about Bash programming, I saw a timeout option passed to read as an array here:

read "${read_flags[@]}" -srn 1 && key "$REPLY"

The value of read_flags is set like this:

read_flags=(-t 0.05)

(The resulting read invocation intended is therefore read -t 0.05 -srn 1).

I can't quite figure out why a string could not have been used, i.e.:

read_flags="-t 0.05"
read "$read_flags" -srn 1 && key "$REPLY"

This string based approach results in an "invalid timeout specification".

Investigating, I came up with a test script parmtest:

show() {
  for i in "$@"; do printf '[%s]' "$i"; done
  printf '\n'
}

opt_string="-t 1" opt_array=(-t 1)

echo 'Using string-based option...' show string "$opt_string" x y z read "$opt_string" echo echo 'Using array-based option...' show array "${opt_array[@]}" x y z read "${opt_array[@]}"

Running this, with bash parmtest ($BASH_VERSION is 5.1.4(1)-release), gives:

Using string-based option...
[string][-t 1][x][y][z]
parmtest: line 11: read:  1: invalid timeout specification

Using array-based option... [array][-t][1][x][y][z] (1 second delay...)

I can see from the debug output that the value of 1 in the array based approach is separate and without whitespace. I can also see from the error message that there's an extra space before the 1: read: 1: invalid timeout specification. My suspicions are in that area.

The strange thing is that if I use this approach with another command, e.g. date, the problem doesn't exist:

show() {
  for i in "$@"; do printf '[%s]' "$i"; done
  printf '\n'
}

opt_string="-d 1" opt_array=(-d 1)

echo 'Using string-based option...' show string "$opt_string" x y z date "$opt_string" echo echo 'Using array-based option...' show array "${opt_array[@]}" x y z date "${opt_array[@]}"

(The only differences are the opt_string and opt_array now specify -d not -t and I'm calling date not read in each case).

When run with bash parmtest this produces:

Using string-based option...
[string][-d 1][x][y][z]
Wed Sep  1 01:00:00 UTC 2021

Using array-based option... [array][-d][1][x][y][z] Wed Sep 1 01:00:00 UTC 2021

No error.

I've searched, but in vain, to find an answer to this. Moreover, the author wrote this bit directly in one go and used an array immediately, which makes me wonder.

Thank you in advance.

Update 03 Sep : Here's the blog post where I've written up what I've learned so far from reading through fff, and I've referenced this question and the great answers in it too: Exploring fff part 1 - main.

qmacro
  • 143
  • 1
    btw for i in "$@" do printf '[%s]' "$i" is printf '[%s]' "$@". the builtin printf prints until there's no more arguments. – jthill Sep 01 '21 at 16:32
  • Brill, thanks @jthill. I'd originally had it with echo, but that's a weak excuse I guess :) – qmacro Sep 01 '21 at 19:31

2 Answers2

10

The reason is a difference in how the read builtin function and the date command interpret their command-line arguments.

But, first things first. In both of your examples, you place - as is recommended - quotes around the dereferencing of your shell variables, be it "${read_flags[@]}" in the array case or "$read_flags" in the scalar case. The main reason why it is recommended to always quote your shell variables is to prevent unwanted word splitting. Consider the following

  • You have a file called My favorite songs.txt with spaces in it, and want to move it to the directory playlists/.
  • If you store the filename in a variable $fname and call
    mv $fname playlists/
    
    the mv command will see four arguments: My, favorite, songs.txt and playlists/ and try to move the three nonexistant files My, favorite and songs.txt to the directory playlists/. Obviously not what you want.
  • Instead, if you place the $fname reference in double-quotes, as in
    mv "$fname" playlists/
    
    it makes sure the shell passes this entire string including the spaces as one word to mv, so that it recognizes it is just one file (albeit with spaces in its name) that needs to be moved.

Now you have a situation in which you want to store option arguments in a shell variable. These are tricky, because sometimes they are long, sometimes short, and sometimes they take a value. There are numerous ways on how to specify options that take arguments, and usually how they are parsed is left entirely at the discretion of the programmer (see this Q&A) for a discussion). The reason why Bash's read builtin and the date command react differently is therefore likely in the internal workings on how these two parse their command-line arguments. However, we may speculate a little.

  • When storing -t 0.05 in a scalar shell variable and passing it as "$opt_string", the recipient will see this as one string containing a space (see above).
  • When storing -t and 0.05 in an array variable and passing it as "${opt_array[@]}" the recipient will see this as two separate items, the -t and the 0.05.(1)(2)
  • Many programs will use the getopt() function from the GNU C library for parsing command-line arguments, as is recommended by the POSIX guidelines.
  • The getopt() distinguishes "short" options and "long" option format, e.g. date -u or date --utc in case of the date command. The way option values for an option (say, -o / --option) are interpreted by getopt is usually -ovalue or -o value for short options and --option=value or --option value for long options.
  • When passing -t 0.05 as two words to a tool that uses getopt(), it will take the first character after the - as being the option name and the next word as the option value (the -o value syntax). So, read would take t as option name and 0.05 as option value.
  • When passing -t 0.05 as one word, it will be interpreted as the -ovalue syntax: getopt() will take (again) the first character after the - as the option name and the remainder of the string as option value, so the value would be 0.05 with a leading space.
  • The read command apparently doesn't accept timeout specifications with a leading space. And indeed, if you call
    read -t " 0.05" -srn 1
    
    where the value is explicitly a string with leading space, read also complains about this.

As a conclusion, the date command is obviously written in a more lenient way when it comes to the option value for -d and doesn't care if the value string starts with a space. This is perhaps not unexpected, as the values that the date specifications can take on are very diverse, as opposed to the case of a timeout specification that (clearly) needs to be a number.


(1) Note that using the @ (as opposed to *) makes a great difference here, because when the array reference is quoted, all array elements will then appear as if they were individually quoted and thus could contain spaces themselves without being split further.

(2) In principle, there is a third option: Store -t 0.05 in a scalar variable $opt_string, but pass it as $opt_string without the quotes. In this case, we would have word-splitting at the space, and again two items, -t and 0.05, would be passed separately to the program. However, this is not the recommended way because sometimes your argument value will have explicit whitespaces that need preserving.

AdminBee
  • 22,803
  • Such a lovely explanation, thank you. Insight into the getopt() level of parsing is especially helpful. In my head scratchings I'd completely forgotten about the possibility of -t0.05 (no space) and what that implies, too. Great stuff! – qmacro Sep 01 '21 at 11:57
  • Yah, it boils down to date scanning integer arguments with %d optional-leading-whitespace-then-a-decimal-integer and read scanning them with %d, just decimal-integer, no leading whitespace accepted. read -t ' 1' gets rejected too. – jthill Sep 01 '21 at 19:37
  • @jthill, except that scanf("%d") accepts and discards leading whitespace. The POSIX description says "Input white-space characters (as specified by isspace) shall be skipped, unless the conversion specification includes a [, c, C, or n conversion specifier." Also, %d wouldn't do, since it can take a fractional number. Bash uses its own code for parsing that. – ilkkachu Sep 02 '21 at 14:45
4
read_flags="-t 0.05"
read "$read_flags" -srn 1

Here, "$read_flags" is in double quotes, so it's not wordsplit. As you saw, the result is the same as running

read "-t 0.05" -srn 1

which means the specified timeout indeed has a leading space. Now, apparently whatever Bash does to parse the number doesn't like that.

What the extra space does, depends totally on the program. When parsing a number, it should be easy enough to ignore any leading white space, and the standard strtod() function does just that. With date -d, it has to parse a more complex string, so it's not surprising it's not strict about whitespace. (It could be something like 12:00 Jun 4 2019 UTC + 5 days and not just a single number.) Hard to say why Bash is so picky here.

Now, if you were passing a filename, a string with a leading space would be a different filename than the one without, and it'd be hard for any program to know to ignore it.


With such simple values (without glob characters and where you want to split on each run of whitespace, assuming the default IFS), you could indeed use a string instead of an array, you just need to not quote it, so that it is split into two distinct arguments. So, read $read_flags .... Or just set timeoutflag=-t0.05 and then read "$timeoutflag" .... Though note that read "$timeoutflag" isn't optimal in that if the variable is empty, it will get passed as a distinct empty argument, giving an error.

In general, arrays are the correct way to store and use arbitrary lists of arguments without issues.

Somewhat related: How can we run a command stored in a variable?

ilkkachu
  • 138,973
  • Thank you, this was also very useful in helping me with more understanding and insight. It seems that this revolves around wordsplitting - or not (as @AdminBee stresses). – qmacro Sep 01 '21 at 12:00