2

I have seen several times the use of "list context" and "string context".

I know and understand the use of such descriptions in perl. They apply to $ and @.

However, when used in shell descriptions:

They seem diffuse as a term that has not been defined anywhere or
at best, poorly documented.

There is no definition in POSIX for that, acording to google

Is this (from this) the gist of it ? :

In a nutshell, double quotes are necessary wherever a list of words or a pattern is expected. They are optional in contexts where a raw string is expected by the parser.

But it seems like a dificult term to use. How could we find "what the result should be" when "the result is needed" to know if it is a string or list context.

Or could it be preciselly and correctly defined?

3 Answers3

8

There is no such concept in the standard shell language. There are no "contexts" only expansion steps.

Quotes are first identified in the tokenization which produces words. They glue words together so that abc"spaces here"xyz is one "word".

The important thing to understand is that quotes are preserved through the subsequent expansion steps, and the original quotes are distinguished from quotes that might arise out of expansions.

Parameters are expanded without regard for double quotes. Later, though, a field splitting process takes place which harkens back to the first tokenization. Once again, quotes prevent splitting and, once again, are preserved.

Pathname expansion ("globbing") takes place after this splitting. The preserved quotes prevent it: globbing operators are not recognized inside quotes.

Finally the quotes are removed by a late stage called "quote removal". Of course, only the original quotes!

POSIX does a good job of presenting the process in a way that is understandable; attempts to demystify it with extraneous concepts (that may be misleading) are only going to muddle the understanding.

People throwing around ad hoc concepts like "list context" should formalize their thinking to the point that it can provide a complete alternative specification for all of the processing, which is equivalent (produces the same results). And then, avoid mixing concepts between the parallel designs: use one explanation or the other. A "list context" or "string context" makes sense in a theory of shell expansion in which these are well defined, and the processing steps are organized around these concepts.

If I were to guess, then "list context" refers to the idea that the shell is working with a list of tokenized words such as the two-word list {foo} {abc" x "def}. The quotes are not part of the second word: its content is actually abc x def; they are semantic quotes which prevent the splitting on whitespace. Inside these quotes, we have "string context".

However, a possible implementation of these expansion steps is not to actually have quotes which are identified as the original quotes, but some sort of list data structure, so that {foo} {abc" x "def} is, say, a list of lists in which the quoted parts are identified as different kinds of nodes (and the quotes are gone). Using Lisp notation it could be:

(("foo") ;; one-element word
 ("abc" (:dq-str " x ") "def")) ;; three-element word

The nodes without a label are literal text, :dq-str is a double-quote region. Another type could be :sq-str for a single quoted item.

The expansion can walk this structure, and then do different things based on whether it's looking at a string object, a :dq-str expression or whatever. File expansion and field splitting would be suppressed within both :dq-str or :sq-str. But parameter expansion does take place within :dq-str. "Quote removal" would then correspond to a final pass which takes the pieces and catenates the strings, flattening the interior list structure and losing the type indicating symbols, resulting in:

("foo"
 "abc x def") ;; plain string list, usable as command arguments

Now here, note how in the second item we have ("abc" (:dq-str " x ") "def"). The first and last items are unwrapped: they are direct elements of the list and so we can say these are in the "list context". Whereas, the middle " x " is wrapped in a :dq-str expression, so that is "(double quoted) string context".

What "list" refers to in "list context" is anyone's guess without a clearly defined model such as this. Is it the master word list? Or a list of chunks representing one word?

Kaz
  • 8,273
  • Paragraph 3, subsequent expansion steps? – cat Jul 16 '17 at 07:40
  • 1
    All you've proved is that POSIX does not use the phrasing “list context” and “string context”. It does in fact use the concepts: there are places where field splitting and pathname expansion are performed, and places where they aren't. “Context” is the technical word for a “place”. The formulation in POSIX does a terrible job of presenting the process in a didactic way (which isn't its job, it's a standard, not a tutorial). – Gilles 'SO- stop being evil' Jul 19 '17 at 00:57
  • This answer is beside the point and doesn't even try to explain why for instance globs are expanded in cmd *.txt or for i in *.txt or array=(*.txt) ("list contexts") and not in var=*.txt, case *.txt in (non-list, scalar contexts; I wouldn't use string context here which makes little sense here). – Stéphane Chazelas Jul 30 '17 at 05:49
  • doesn't even try to explain why for instance globs are expanded in ... I don't see that in the question! More to the point, this business of 'list context" and "string context' doesn't explain it, either and that is a flaw.

    – Kaz Jul 30 '17 at 05:54
  • @StéphaneChazelas The requirement is given explicitly in "2.9.1 Simple Commands": The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command. Nothing about any list or string contexts. – Kaz Jul 30 '17 at 06:00
  • @Kaz, the question was not about explaining the use of list context in POSIX since POSIX doesn't use that term. – Stéphane Chazelas Jul 30 '17 at 06:03
  • @StéphaneChazelas The contexts-based specification could capture those nuances in straightforward ways. Just like in POSIX, it could stipulate the recognition of assignments or the controlling expressions in case statements and such which are exempt from certain expansions. Those could be identified as additional contexts. I don't suspect that this "list context" and "string context" are not well developed in the wild where OP has seen the terms being used; any refinement we do here will be "invented here" work, so it seems pointless. – Kaz Jul 30 '17 at 06:27
  • @Kaz, see my answer. even POSIX' context where field splitting will be performed wording for my list context is not enough to describe the behaviour of any shell (remember POSIX is just a specification, there are a lot of areas that are left unspecified allowing different behaviour by different shells). – Stéphane Chazelas Jul 31 '17 at 14:40
5

Since most of the occurrences you reference are by me, I feel I have to give an answer here, though I will mostly paraphrase @Gilles.

I've been using list context vs scalar/non-list context (better than string context which can be confusing if not understood as non-list context) dozens of times since at least 2004 on usenet or unix.SE most of the time in articles discussing the impact of leaving expansions unquoted in Bourne-like shells. I don't remember anyone requesting clarification as to what I meant by that before (I do often try to give some examples of such contexts for illustration)

I've not used it in formal specification of any shell languages, that's just English text to help explain the shell behaviour to other persons.

That's not official terminology though that's obviously inspired from perl official (in the documentation) terminology. I can't tell if other persons have used those in the context of Unix shells before me (though it's very likely they did), but certainly people have since. I don't claim ownership of that.

list context (at least when I used it in the contexts I've used it) simply means contexts where the shell is expecting any number of elements. While scalar/non-list/string context would be where only one (or a single string/scalar if you want) is expected, like in perl. In most Bourne-like shells, those list contexts are:

  • simple command arguments (as in echo elements)
  • for i in elements
  • array=(elements) (and variant with +=)

Some shells have more like:

  • cmd < elements in zsh which does something similar to cat -- elements | cmd (as in nl < *.txt, nl < {foo,bar}.txt but nl < foo.txt < bar.txt).
  • cmd > elements (and variants with >| >>...) in zsh which does something similar to cmd | tee -- elements
  • elements() { code; } in zsh to define one ore more functions at once (or nothing if elements resolves to an empty list (though a literal () { echo x; } is an anonymous function)).
  • compound=(foo=(elements) elements) or matrix=((elements) (elements)) and so on in ksh93.
  • etc.

In those contexts, typically, globs are expanded and you need to quote your expansions if you don't want split+glob (or just empty-removal with zsh unless you enable the shwordsplit/globsubst sh-compatibility options) to be applied to them.

For instance, if you replace elements with *.txt in the examples above, *.txt will be expanded to the list of txt files in the current directory.

If you're looking for an equivalent in the POSIX specification, look for the contexts where globs are expanded. POSIX, in at least one instance refers to that as a context where field splitting will be performed (wording which was actually changed after I raised issues with the previous wording to the Austin group). Of course, that wording is not very useful to answer the questions about where field splitting is performed.

The scalar contexts would be the other contexts.

In

scalar=*.txt
case *.txt in...
[[ -f *.txt ]]

*.txt cannot be expanded because the shell is expecting just one string.

As a caveat/limitation, those terms don't cleanly capture what happens in cmd > * or cmd > ~(N)pattern or a=(); b=; c=(a b); d=*; IFS=:; e=a:b; cmd 1> "${a[@]}" 2> $b 3> "${c[@]}" 4> $d 5> $e in shells like bash/yash (when not in POSIX mode), ksh88 (using set -A instead of var=(...) syntax) or ksh93 (only when interactive with some), where it could be seen as another list context except that only a list with one element is expected (with splitting and globbing working differently for some).

  • Thanks for your input. That nobody has asked before for a clarification simply means that a clarification was needed. I'll assume that your description is meaning: List context is where glob+split will be applied. Is that close to what you mean?. As that answer the question: +1 from me. –  Jul 31 '17 at 18:45
  • @Arrow, as I said, glob+split occurs in list contexts (where the shell expects a list of elements). But in some shells, split and glob occur in a context that can hardly be seen as a list one in those shells: the target of redirections in shells where those redirections can be from/to only one file, though that could be considered a design bug in those shells. – Stéphane Chazelas Jul 31 '17 at 21:39
  • If you are going to keep using such descriptions you need to think and really define it. If not, that will only be a fuzzy description of an idea, and, I am sorry to say, a personal idea. That can not be used to convey ideas in a clear way, it will remain un-defined, un-clear, fuzzy. That is a sure source of mis-interpretations. Note that a definition needs not to cover all possibilities in one sentence, but should be short (very short if possible). cont... –  Jul 31 '17 at 22:15
  • Think of [ (test), defined as evaluate the expression and indicate the result of the evaluation by its exit status. simple, short, clear. It fails to describe all the corner cases but could convey a clear idea. I invite you to find one such definition so the concept is clear for everyone involved. –  Jul 31 '17 at 22:15
2

The wording “list context” and “string context” comes from Perl, but similar concepts apply to the shell language. Note that these are similar concepts: the kinds of contexts and the consequences of the context type are different.

The word context is a technical term in programming language semantics. Its exact meaning is tied to a particular semantic formalization, which is beyond the scope of this answer. The cognitive meaning is the nature of the surroundings of a code snippet. For example, saying that the code snippet $foo has a different meaning in different contexts means that the behavior of a program containing $foo depends on the nature of what is around that occurrence of $foo in the program.

The semantics of the shell is rather complex. It doesn't fall neatly into traditional categories that you'll find in introductory textbooks on programming languages. The execution of a shell program can be broken into two phases (note that this is a way to present the semantics, it doesn't mean that a shell interpreter has to be broken up in this way):

  1. A parsing stage turns a string (the content of the source file or of the argument to -c) into an abstract syntax tree. In the POSIX specification, this corresponds to steps 2 and 3 (token recognition and parsing). The POSIX specification defines grammar rules that describe the shape of the tree. Note that this isn't a context-free grammar — the presentation is based on the usual presentation of context-free grammars, but the annotations “apply rule N” make it a more complex mathematical object.

  2. An execution stage performs some evaluation on the nodes of the tree, and calls external commands. In the POSIX specification, this corresponds to steps 4–7 (expansion, redirection, command execution and waiting).

Expansion is a process that applies to a particular type of node in the abstract syntax tree, which POSIX calls WORD and which is commonly called “word”. It can be divided into two groups.

  1. The first group of expansion consists, in POSIX terminology, of tilde expansion (e.g. ~foo/home/foo), parameter expansion (e.g. $foobar if the value of foo is bar), command substitution (e.g. $(foo)bar if the output of the command foo is bar) and arithmetic expansion (e.g. $((2+2))4). This first group of expansions is performed on every word, excluding characters that “quoted” by virtue of being inside single quotes or preceded by a backslash. The output of this group of expansions is approximately a string with annotations (I'll explain the approximation below).

  2. The second group of expansion consists of field splitting and pathname expansion (commonly known as “filename generation” or “globbing”). This group of expansions turns an annotated string into a list of strings. This group of expansions is performed on a subset of the places where the first group is performed: it is not performed on parts of words that are quoted with double quotes, and it is not performed at all if the word is in certain positions in the abstract syntax tree. This is where list and string contexts come in: in certain contexts, i.e. for certain classes of positions in the abstract syntax tree, the second group of expansions is performed. These are list contexts, so named because the outcome of the expansion process is a list (of strings). In the other contexts, called string contexts, the second group of expansions is not performed, and the outcome of the expansion process is a single string.

POSIX describes quote removal as happening as the last expansion stage. This is one way to explain quoting, with all expansions before field splitting being defined as a transformation from a string to a string. For example, given the word '$foo'$bar\$qux, assuming that the value of the variable bar is value, parameter expansion turns this into '$foo'value\$qux and the other first-group expansions leave the string unchanged. Quote removal finally strips the quotes to get $foovalue$qux.

The presentation with quote removal requires performing quote matching at each stage. A presentation that's simpler to follow and implement, and gives the same end result, is to perform a dequoting stage that results in a list of parts. Each part is annotated to remember whether it was quoted. For example, '$foo'$bar\$qux dequotes to the following parts: quoted $foo, unquoted variable expansion of bar, quoted $, bare q, bare u, bare x. (Distinguishing between “quoted” and “bare” is necessary for things like identifying assignments and deciding whether to expand aliases.) Second-stage expansions only happen to unquoted parts in list context.

POSIX specifies whether the second group of expansions happen by explicitly listing the expansion stages. For example, “Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value”. A simpler way to phrase this is that only the first-group expansions happen, i.e. that assignment is a string context. There are only two context because there are only two sets of rules for performing expansions: either all of the first group is performed (string context), or both groups are performed in order (list context).

(Actually, to be complete, there is a third kind of context: case pattern context. In a case pattern, only first-group expansions are performed (like in string context), but a part of the second group of expansions is relevant — unquoted globbing characters are wildcards for the string matching.)

The definition of the language specifies which contexts are list contexts and which are string contexts. In principle, this could be arbitrary. However there is an intuition behind it: in places where the grammar expects a list of WORD tokens, second-group expansion is performed on these tokens, whereas in places where the grammar wants a single WORD, second-group expansion is not performed. A simple way to explain this is that where the grammar expects a list, it's a list context, and where the grammar expects a single string, it's a string context.

  • I don't really know where to start. So, I'll start by saying: Thanks for your answer. It seems very clear that you have some "idea" in your mind of what "list context" should mean. –  Jul 20 '17 at 01:40
  • But you supply no practical means to "test" whether some context is "list" or "string", no specific practical rules for an user to know where and when one or the other should be applied. With those limitations it is impossible to use such idea for something practical. It remains as an "abstract tree", one which only the parser could unravel. –  Jul 20 '17 at 01:41
  • And then you also add a new context: "case context". Perl only defines two context, no more. This is no longer similar to Perl's list context. It is the description of a personal point of view of some shell aspect. –  Jul 20 '17 at 01:41
  • If you were to simply define a "list context" as one place in which split or (and) pathname expansion will be performed, we will reduce this new definition (which is useful and well defined in Perl) to simpler definitions already existing on the shell. –  Jul 20 '17 at 01:42
  • To clarify: In addition to case patterns, [[ STRING = PATTERN ]] is also an example of what you refer to as case pattern context. – jrw32982 Jul 27 '20 at 16:02