1

In POSIX 7, the shell grammar (Section 2.10 in XCU) mentions several token identifiers. I am confused about two: WORD and NAME. What are their differences?

For example, is a command's option WORD or NAME? How about a command's non-optional argument?

AdminBee
  • 22,803
Tim
  • 101,790

2 Answers2

3

All of the uppercase names in this section refer to (possibly machine-compilable) lex descriptions of the grammar (starting in 2.10. Shell Grammar). The feature asked about is clarified in item 5:

[ NAME in for]

When the TOKEN meets the requirements for a name (see XBD Name ), the token identifier NAME shall result. Otherwise, the token WORD shall be returned.

That is (referring to 3.231 Name), a NAME is a certain type of WORD:

In the shell command language, a word consisting solely of underscores, digits, and alphabetics from the portable character set. The first character of a name is not a digit.

Not all words are names: a decimal integer is a word, but not a name.

Regarding the grammar, these lines tell yacc what symbolic constants (via #define) lex might return:

%token  WORD
%token  ASSIGNMENT_WORD
%token  NAME
%token  NEWLINE
%token  IO_NUMBER

while the yacc grammar (rules) begins with

%start  complete_command

You may notice occurrences of WORD and NAME in the grammar. yacc expects lex to return those symbolic constants at those points. Conventionally, uppercase names are used for this purpose, with other names being just the rules within the yacc grammar.

When interpreting a command, the shell interpreter only cares about the first WORD, which it expects to be a NAME. It passes the other WORDs to the command as parameters, and the command has to decide what those mean. The yacc grammar is vague in this area, but note the reference to "7a". There is no labeled item for that in the written standard, but it devolves off to 2.9.1 Simple Commands corresponding to this clump in the grammar:

simple_command   : cmd_prefix cmd_word cmd_suffix
                 | cmd_prefix cmd_word
                 | cmd_prefix
                 | cmd_name cmd_suffix
                 | cmd_name

(as an exercise, someone might try completing the grammar and making it actually match the standard with respect to terminology).

Thomas Dickey
  • 76,765
  • Thanks. I am more wondering what the tokens with "WORD" and "NAME" identifiers are used for? more about their meanings than their syntax. – Tim Mar 16 '16 at 20:02
  • For example, is a command's option WORD or NAME? How about a command's non-optional argument? – Tim Mar 16 '16 at 20:35
0

Word in the command line processing aspect is what is usually understood as a word in many languages: some group of characters limited (mostly) by spaces.

Technically, a group of characters separated by metacharacters (characters that, when unquoted, separates words).

The first step from a very clear description of command line processing:

  1. Splits the command into tokens that are separated by the fixed set of metacharacters: SPACE, TAB, NEWLINE, ;, (, ), <, >, |, and &. Types of tokens include words, keywords, I/O redirectors, and semicolons.

After the line has been divided into Words (or tokens), tokens are identified using syntax rules and labeled acordingly.


Name is just a label used to identify something: a command name, a variable name, a parameter name, a builtin name, etc.