4

I have certain questions regarding grep.

  1. Why does the following command match '<Hello'?

    $ grep -E "\<H" test
    Hello World
    <Hello
    H<ello
    
  2. What needs to be done to match '<Hello' only?

user3539
  • 4,378

2 Answers2

6

To prevent grep from interpreting a string specially (a regular expression), use -F (or --fixed-string):

$ cat test
one < two
Hello World
X<H
A <H A
I said: <Hello>
$ grep -F '<H' test
X<H
A <H A
I said: <Hello>

Remember to quote the search pattern properly, otherwise it may be interpreted badly by your shell. For example, if you ran grep -F <H test instead, the shell will try to open a file named "H" and use it to feed standard input of grep. grep will search for the string "test" in that stream. The following commands are roughly equivalent to each other, but not to the above:

 grep -F <H test
 grep -F test <H         # location of `<H` does not matter
 grep -F H test
 cat H | grep -F test    # useless cat award

As for matching words only, have a look at the manual page grep(1):

   -w, --word-regexp
          Select  only those lines containing matches that form whole words.  The
          test is that the matching substring must either be at the beginning  of
          the  line, or preceded by a non-word constituent character.  Similarly,
          it must be either at the end of the line  or  followed  by  a  non-word
          constituent   character.    Word-constituent  characters  are  letters,
          digits, and the underscore.

Example usage (using the above test file):

$ grep -F -w '<H' test
A <H A

(-F is optional here as <H does not have a special meaning, but if you intent to extend this literal pattern, it may be useful then)

To match the beginning of a word, you do need regular expressions though:

$ grep -w '<H.*' test    # match words starting with `<H` followed by anything
A <H A
I said: <Hello>
dhag
  • 15,736
  • 4
  • 55
  • 65
Lekensteyn
  • 20,830
  • The last one was I was looking for. I thought I could give grep '\<<' test to match < at beginning of each word. But it didnt work out. Any idea why it didnt work? – user3539 Mar 06 '13 at 00:19
  • @user3539 possibly because < is not considered a word character. See the manual page under The Backslash Character and Special Expression – Lekensteyn Mar 06 '13 at 09:57
2

< is not a special character in any grep. However, in GNU grep \< is special and means the beginning of word (so the zero-width boundary before Hello in all your input lines).

In all greps \ is special. It either can escape a special character to remove its special meaning (so it's matched literally) or add a special meaning to a character (that's typically used to introduce new operators without breaking existing scripts, another way is to use things that would otherwise be invalid like *? or (?) or for ANSI C escape sequences like \n, \t...

To remove the special meaning of \, like the others, you need another \.

So to match <Hello, you need:

grep -E '<Hello'

And to match \<Hello, you need:

grep -E '\\<Hello'

Note that both < and \ are special to the shell as well so need quoting for the shell as well, hence the single quotes above (\ is also special (to the shell) inside double quotes, though only in front of other special characters inside quotes like newline, double quote, backslash, dollar or backtick, so you'd nee grep -E "\\\<Hello" or grep -E "\\\\<Hello" to match \<Hello).

So that pattern matches the full line, add the -x option to grep:

grep -xE '<Hello'

would match only lines whose content it exactly "<Hello".

To match at the beginning of the line:

grep -E '^<Hello'

(would match "<Hello" and "<Hello world>", but not World <Hello.

To match <Hello not preceded by a non-blank character (my interpretation of your at the beginning of a word):

grep -E '(^|[[:blank:]])<Hello'

or with BRE:

grep '^\(.*[[:blank:]]\)\{0,1\}<Hello'