In POSIX 7, the shell grammar (Section 2.10 in XCU) mentions several token identifiers. I am confused about two: WORD and NAME. What are their differences?
For example, is a command's option WORD or NAME? How about a command's non-optional argument?
In POSIX 7, the shell grammar (Section 2.10 in XCU) mentions several token identifiers. I am confused about two: WORD and NAME. What are their differences?
For example, is a command's option WORD or NAME? How about a command's non-optional argument?
All of the uppercase names in this section refer to (possibly machine-compilable) lex descriptions of the grammar (starting in 2.10. Shell Grammar). The feature asked about is clarified in item 5:
[
NAMEinfor]
When the TOKEN meets the requirements for a name (see XBD Name ), the token identifier NAME shall result. Otherwise, the token WORD shall be returned.
That is (referring to 3.231 Name), a NAME is a certain type of WORD:
In the shell command language, a word consisting solely of underscores, digits, and alphabetics from the portable character set. The first character of a name is not a digit.
Not all words are names: a decimal integer is a word, but not a name.
Regarding the grammar, these lines tell yacc what symbolic constants (via #define) lex might return:
%token WORD
%token ASSIGNMENT_WORD
%token NAME
%token NEWLINE
%token IO_NUMBER
while the yacc grammar (rules) begins with
%start complete_command
You may notice occurrences of WORD and NAME in the grammar. yacc expects lex to return those symbolic constants at those points. Conventionally, uppercase names are used for this purpose, with other names being just the rules within the yacc grammar.
When interpreting a command, the shell interpreter only cares about the first WORD, which it expects to be a NAME. It passes the other WORDs to the command as parameters, and the command has to decide what those mean. The yacc grammar is vague in this area, but note the reference to "7a". There is no labeled item for that in the written standard, but it devolves off to 2.9.1 Simple Commands corresponding to this clump in the grammar:
simple_command : cmd_prefix cmd_word cmd_suffix
| cmd_prefix cmd_word
| cmd_prefix
| cmd_name cmd_suffix
| cmd_name
(as an exercise, someone might try completing the grammar and making it actually match the standard with respect to terminology).
Word in the command line processing aspect is what is usually understood as a word in many languages: some group of characters limited (mostly) by spaces.
Technically, a group of characters separated by metacharacters (characters that, when unquoted, separates words).
The first step from a very clear description of command line processing:
After the line has been divided into Words (or tokens), tokens are identified using syntax rules and labeled acordingly.
Name is just a label used to identify something: a command name, a variable name, a parameter name, a builtin name, etc.