68

I have a file that has "then"'s and "there"'s.

I can

$ grep "then " x.x
x and then some
x and then some
x and then some
x and then some

and I can

$ grep "there " x.x
If there is no blob none some will be created

How can I search for both in one operation? I tried

$ grep (then|there) x.x

-bash: syntax error near unexpected token `('

and

grep "(then|there)" x.x
durrantm.../code
# (Nothing)
  • linked, https://unix.stackexchange.com/questions/37313/how-do-i-grep-for-multiple-patterns-with-pattern-having-a-pipe-character – TT-- Aug 28 '18 at 20:01

4 Answers4

89

You need to put the expression in quotation marks. The error you are receiving is a result of bash interpretting the ( as a special character.

Also, you need to tell grep to use extended regular expressions.

$ grep -E '(then|there)' x.x

Without extended regular expressions, you have to escape the |, (, and ). Note that we use single quotation marks here. Bash treats backslashes within double quotation marks specially.

$ grep '\(then\|there\)' x.x

The grouping isn't necessary in this case.

$ grep 'then\|there' x.x

It would be necessary for something like this:

$ grep 'the\(n\|re\)' x.x
9

Just a quick addendum, most flavours have a command called egrep which is just grep with -E. I personally like much better to type

egrep "i(Pod|Pad|Phone)" access.log

Than to use grep -E

Trausti Thor
  • 203
  • 1
  • 5
3

The stuff documented under REGULAR EXPRESSIONS in the (or at least, my) man page is actually for extended regexps;

grep understands three different versions of regular expression syntax: “basic,” “extended” and “perl.” In GNU grep, there is no difference in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards.

But grep does not use them by default -- you need the -E switch:

grep "(then|there)" x.x

Because (from the man page again):

Basic vs Extended Regular Expressions

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, +, {, \|, (, and ).

So you can also use:

grep "then\|there" x.x

Since the parentheses are superfluous in this case.

goldilocks
  • 87,661
  • 30
  • 204
  • 262
1

Bash's elegant simplicity seems to get lost in it's huge man page.

In addition to the excellent solutions above, I thought I'd try to give you a cheat sheet on how bash parses and interprets statements. Then using this roadmap I'll parse the examples presented by the questioner to help you better understand why they don't work as intended.


Note: Shell script lines are used directly. Typed input-lines are first history-expanded.

Each bash line is first tokenized, or in other words chopped into what are called tokens. (Tokenizing occurs before all other expansions, including brace, tilde, parameter, command, arithmetic, process, word splitting, & filename expansion.)

A token here means a portion of the input line separated (delimited) by one of these special meta-characters:

space,  - White space...
tab, 
newline,

‘<’,    - Redirection & piping...
‘|’, 
‘>’
‘&’,    - And/Both < | > | >>  .or.  &<file descriptor>

‘;’,    - Command termination

‘(’,    - Subshell, closed by -     ‘)’

Bash uses many other special characters but only these 10 produce the initial tokens.

However because these meta-characters also sometimes must be used within a token, there needs to be a way to take away their special meaning. This is called escaping. Escaping is done either by quoting a string of one or more characters, (i.e. 'xx..', "xx.."), or by prefixing an individual character with a back-slash, (i.e. \x). (It's a little more complicate than this because the quotes also need to be quoted, and because double quotes don't quote everything, but this simplification will do for now.)

Don't confuse bash quoting with the idea of quoting a string of text, like in other languages. What is in between quotes in bash are not strings, but rather sections of the input line that have meta-characters escaped so they don't delimit tokens.

Note, there is an important difference between ', and ", but that's for another day.

The remaining unescaped meta-characters then become token separators.

For example,

$ echo "x"'y'\g
xyg

$ echo "<"'|'\>
<|>

$ echo x\; echo y
x; echo y

In the first example there are two tokens produced by a space delimiter: echo and xyz.

Likewise in the 2nd example.

In the third example the semicolon is escaped, so there are 4 tokens produced by a space delimiter, echo, x;, echo, and y. The first token is then run as the command, and takes the next three tokens as input. Note the 2nd echo is not executed.


The important thing to remember is that bash first looks for escaping characters (', ", and \), and then looks for unescaped meta-character delimiters, in that order.

If not escaped then these 10 special characters serve as token delimiters. Some of them also have additional meaning, but first and foremost, they are token delimiters.


What grep expects

In the example above grep needs these tokens, grep, string, filename.

The question's first try was:

$ grep (then|there) x.x

In this case (, ) and | are unescaped meta characters and so serve to split the input into these tokens: grep, (, then, |, there, ), and x.x. grep wants to see grep, then|there, and x.x.

The question's second try was:

grep "(then|there)" x.x

This tokenizes into grep, (then|there), x.x. You can see this if you swap out grep for echo:

echo "(then|there)" x.x
(then|there) x.x

Elliptical view
  • 3,921
  • 4
  • 27
  • 46