Bash's elegant simplicity seems to get lost in it's huge man page.
In addition to the excellent solutions above, I thought I'd try to give you a cheat sheet on how bash parses and interprets statements. Then using this roadmap I'll parse the examples presented by the questioner to help you better understand why they don't work as intended.
Note: Shell script lines are used directly. Typed input-lines are first history-expanded.
Each bash line is first tokenized, or in other words chopped into what are called tokens. (Tokenizing occurs before all other expansions, including brace, tilde, parameter, command, arithmetic, process, word splitting, & filename expansion.)
A token here means a portion of the input line separated (delimited) by one of these special meta-characters:
space, - White space...
tab,
newline,
‘<’, - Redirection & piping...
‘|’,
‘>’
‘&’, - And/Both < | > | >> .or. &<file descriptor>
‘;’, - Command termination
‘(’, - Subshell, closed by - ‘)’
Bash uses many other special characters but only these 10 produce the initial tokens.
However because these meta-characters also sometimes must be used within a token, there needs to be a way to take away their special meaning. This is called escaping. Escaping is done either by quoting a string of one or more characters, (i.e. 'xx..'
, "xx.."
), or by prefixing an individual character with a back-slash, (i.e. \x
). (It's a little more complicate than this because the quotes also need to be quoted, and because double quotes don't quote everything, but this simplification will do for now.)
Don't confuse bash quoting with the idea of quoting a string of text, like in other languages. What is in between quotes in bash are not strings, but rather sections of the input line that have meta-characters escaped so they don't delimit tokens.
Note, there is an important difference between '
, and "
, but that's for another day.
The remaining unescaped meta-characters then become token separators.
For example,
$ echo "x"'y'\g
xyg
$ echo "<"'|'\>
<|>
$ echo x\; echo y
x; echo y
In the first example there are two tokens produced by a space delimiter: echo
and xyz
.
Likewise in the 2nd example.
In the third example the semicolon is escaped, so there are 4 tokens produced by a space delimiter, echo
, x;
, echo
, and y
. The first token is then run as the command, and takes the next three tokens as input. Note the 2nd echo
is not executed.
The important thing to remember is that bash first looks for escaping characters ('
, "
, and \
), and then looks for unescaped meta-character delimiters, in that order.
If not escaped then these 10 special characters serve as token
delimiters. Some of them also have additional meaning, but first and foremost, they are token delimiters.
What grep expects
In the example above grep needs these tokens, grep
, string
, filename
.
The question's first try was:
$ grep (then|there) x.x
In this case (
, )
and |
are unescaped meta characters and so serve to split the input into these tokens: grep
, (
, then
, |
, there
, )
, and x.x
. grep wants to see grep
, then|there
, and x.x
.
The question's second try was:
grep "(then|there)" x.x
This tokenizes into grep
, (then|there)
, x.x
. You can see this if you swap out grep for echo:
echo "(then|there)" x.x
(then|there) x.x