3

From bash manual, about tilde expansion:

If a word begins with an unquoted tilde character (‘~’), all of the characters up to the first unquoted slash (or all characters, if there is no unquoted slash) are considered a tilde-prefix .

I was wondering why ~ is recognized as tilde-prefix in

$ mypath=/program_files:~/home/t
$ echo $mypath
/program_files:/home/t/home/t

What words is mypath=/program_files:~/home/t splitted into by the lexer of bash? Is ~/home/t recognized exactly as a word?

What word separators does the lexer of bash use to break a command into words? Are : and = word separators? Are they also words?

Thanks.

This is originated from that I can't understand https://unix.stackexchange.com/a/448469/674

The tilde inside a PATH string is not understood. This is why the POSIX standard requires to expand tilde sequences after a colon in the command line when a shell macro is assigned.

Tim
  • 101,790

2 Answers2

5

This isn’t the result of word splitting (more accurately, token splitting), it’s the result of tilde expansion in variable assignments:

Each variable assignment is checked for unquoted tilde-prefixes immediately following a ‘:’ or the first ‘=’. In these cases, tilde expansion is also performed.

When it splits a command into tokens, the word separators bash uses are its metacharacters:

A character that, when unquoted, separates words. A metacharacter is a space, tab, newline, or one of the following characters: ‘|’, ‘&’, ‘;’, ‘(’, ‘)’, ‘<’, or ‘>’.

mypath=/program_files:~/home/t

is a single token from bash’s perspective.

Stephen Kitt
  • 434,908
  • Thanks. What words i.e. tokens is mypath=/program_files:~/home/t split into? – Tim Jun 07 '18 at 21:09
  • Are = and : not metacharacters? What are they? – Tim Jun 07 '18 at 21:36
  • Also are = and : "words" in bash's definition? – Tim Jun 07 '18 at 21:40
  • @Tim The whole mypath=/program_files:~/home/t is a single token. It is processed as a simple command. – Kusalananda Jun 07 '18 at 21:40
  • ‘=’ and ‘:’ are regular characters. ‘=’ acquires special meaning in the context of shell parameters; as Kusalananda says its presence causes the token to be treated as an assignment. ‘:’ only has special meaning for certain operations, including tilde expansion. – Stephen Kitt Jun 07 '18 at 21:42
  • Thanks. (1) After token splitting, are metacharacters themselves also tokens? (2) some metacharacter can have different meanings, e.g. < can be redirection operator, or comparions operator between two numbers. &, ( and ) also can have different meanings. So does a metacharactor with any of the meanings server as token separator? – Tim Jun 07 '18 at 22:04
  • I suspect they are indeed tokens themselves, because as you say they carry meaning. ps aux|less and ps aux | less are equivalent: | doesn’t lose its piping effect when it’s the only separator between words. – Stephen Kitt Jun 08 '18 at 07:44
3

The tilde is expanded in the assignment to PATH because it is a variable assignment and the tilde comes just after an unquoted colon (in your example).

A "tilde-prefix" consists of an unquoted <tilde> character at the beginning of a word, followed by all of the characters preceding the first unquoted <slash> in the word, or all the characters in the word if there is no <slash>. In an assignment (see XBD Variable Assignment), multiple tilde-prefixes can be used: at the beginning of the word (that is, following the <equals-sign> of the assignment), following any unquoted <colon>, or both.

(from the POSIX text on tilde expansion)

The bash manual puts it like

Each variable assignment is checked for unquoted tilde-prefixes immediately following a : or the first =. In these cases, tilde expansion is also performed. Consequently, one may use filenames with tildes in assignments to PATH, MAILPATH, and CDPATH, and the shell assigns the expanded value.

This means that assigning the unquoted string /program_files:~/home/t to PATH will expand the tilde within, and that $PATH will be the string with the tilde expanded.

Placing a literal tilde in PATH, for example by quoting the string, will cause pathname resolution of commands to fail for that directory (unless there is a directory in the current working directory with that literal name).

bash when not in POSIX mode, will still expand these tildes in PATH when looking for commands.


When the shell scans the line

mypath=/program_files:~/home/t

it is returned to the parser as a single token. It will be processed as a simple command.

A simple command, when recognised as an assignment, undergoes, among other things, tilde expansion. While doing the tilde expansion on the right hand side of the =, the shell will expand the tilde in the string to the home directory of the current user, because it occurs just after a colon.

See also the POSIX text on simple commands.

Kusalananda
  • 333,661
  • I think the comment about tilde not "being understood" in PATH refers to a literal tilde in the PATH (i.e. PATH='~/bin'). Though Bash does seem to interpret the tilde from inside PATH, too, but that's not directly the same as being able to expand it on assignment. – ilkkachu Jun 07 '18 at 22:11
  • @ilkkachu I believe that you are correct, I will patch it up by removing that quote from my answer and then write it up properly in the morning, after sleep. Thanks! – Kusalananda Jun 07 '18 at 22:17