4
[root@localhost opt]# cat cfg
key = value
[root@localhost opt]# grep 'key\s*=\s*.+' cfg
[root@localhost opt]# 

My intent is: the = sign may be followed by zero or more spaces, but must be followed one or more non-space characters.

Why doesn't it output the line key = value?

xmllmx
  • 1,830

2 Answers2

11

Observe:

$ grep 'key\s*=\s*.+' cfg
$ grep 'key\s*=\s*.\+' cfg
key = value
$ grep -E 'key\s*=\s*.+' cfg
key = value

In Basic Regular Expressions (BRE, the default), + means a plus sign. As a GNU extension, one can signal one-or-more-of-the-previous-character using \+. This is also true of ?, {, |, and (. Unless escaped with a backslash, these are all treated a ordinary characters under BRE.

The rules change if you use Extended Regular Expressions, -E. For ERE, the backslash isn't needed and a plain + means one-or-more-of-the-previous-character. Under ERE, \+ means a plain normal plus sign.

John1024
  • 74,655
  • Technically, \+ is a feature of enhanced BREs (like we need even more flavors of REs in POSIX). grep appears to pass the REG_ENHANCED flag to regcomp(); otherwise, you would have to use {1,} like expr does. – chepner Aug 12 '16 at 11:59
  • But grep -E 'key\s*=\s*.+' does match key = (with one or more trailing space) as that's a = followed by 0 space (which matches \s*) followed by one space character (which matches .). Also note that \s/\S are not standard/portable. You'd want grep -Ex 'key\s*=\s*\S.*' or grep -E '^key\s*=\s*\S' to force at least one non-space after the =. [[:space:]] is the standard equivalent of \s, though here, [[:blank:]] may make more sense. – Stéphane Chazelas Aug 12 '16 at 12:27
  • Note that \+ in BRE is a GNU extension. In standard BREs, + is written \{1,\}. – Stéphane Chazelas Aug 12 '16 at 12:58
  • @chepner Yes. Answer updated to note that \+ under BRE is a GNU extension. (When originally posted, this question was tagged Linux, I see that that tag has since been removed.) – John1024 Aug 15 '16 at 20:08
  • @StéphaneChazelas I see that you have posted (+1) a thorough exploration of this regex. – John1024 Aug 15 '16 at 20:09
1
key\s*=\s*.+

is GNU ERE syntax (assuming you want \s to match any spacing character, and + to match one or more of the preceding atom), so you'd need the GNU implementation of grep and pass the -E option.

However, even then that wouldn't make much sense

First

grep 'key\s*=\s*.+'

is functionaly equivalent to

grep 'key\s*=\s*.'

Because if a string matches anything.+, then it also matches anything. and vis-versa.

Also a spacing character is also a character. Since \s* matches 0 or more spacing characters, key\s*=\s*. is functionaly equivalent to key\s*=. (lines that contain key<optional-spaces>=<one-character-space-or-not>).

Here you want:

grep 'key\s*=\s*\S'

to ask for at least one non-spacing character to be found on the right of the =, which is functionaly equivalent to:

grep 'key\s*=.*\S'

Note that it matches key = foo but also nonkey = foo. If you want the key to be only found at the beginning of the line, you need to ask for it with the ^ anchor:

grep '^key\s*=.*\S'

Or use -x for the regexp to match the whole line:

grep -x 'key\s*=.*\S.*'

Note that the standard equivalent of \s is [[:space:]] ([^[:space:]] for \S).

Another way to address the requirement would be to use extended operators found in some regexps like the PCRE ones to prevent back-tracking.

key=\s*. matches key=  because the regexp engine has \s* go greedily through the space characters after the =, finds 1 and then realises it can't match the . as it reached the end of the line, and then back-tracks to try with fewer matches of \s (0 in that case) so the next . can match (here a space character).

With PCRE, like when using the -P option with GNU grep, you can write:

 grep -P '^key\s*=(?>\s*).'

That (?>...) syntax prevents back-tracking. So the \s* will eat as many spacing characters as possible without being able to backtrack, so will only match if there's at least one non-spacing character after the spaces.

$ printf 'key=%s\n' '' ' ' ' a' | grep '^key\s*=\s*.'
key=
key= a
$ printf 'key=%s\n' '' ' ' ' a' | grep -P '^key\s*=(?>\s*).'
key= a
$ printf 'key=%s\n' '' ' ' ' a' | grep '^key\s*=.*\S'
key= a