3

Given a file, "test.log" with the following contents:

line1 Patient 123 45566
line2 Patient 432
line3 Patient 234 456
line4 Patient 321
line5

I am trying to select line 2 and line 4 with this pattern:

grep "Patient\s\d+\s" test.log
# but this works testing at https://rubular.com/

Doesn't work nor does this:

grep "Patient\s\d+\n" test.log
# but this works testing at https://regexr.com/47qd5

What am I doing wrong?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

4 Answers4

6

1. Use either named classes or PCRE

GNU grep uses by default Basic Regular Expressions (BRE), but it also let you use Extended Regular Expressions (ERE) and Perl-compatible Regular Expressions (PCRE).

Please note that neither BRE nor ERE support \s nor \d, but they have similar features. From man grep:

Finally, certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]. For example, [[:alnum:]] means the character class of numbers and letters in the current locale. In the C locale and ASCII character set encoding, this is the same as [0-9A-Za-z]. (Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.) Most meta-characters lose their special meaning inside bracket expressions. To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last.

Example:

$ grep -E '^[[:digit:]]+$' << 'EOF'
> foo
> 123
> bar
> EOF
123

You may also use PCRE, as it supports \s and \d:

$ grep -P '^\d+$' << 'EOF'
> foo
> 123
> bar
> EOF
123

2. \n doesn't work

In Unix every \n delimits a line. grep prints lines that match a given pattern. Matching \n itself wouldn't make sense in this context.

You could either use $ to match the end of the line:

$ grep -E 'foo bar$' << 'EOF'
> foo
> foo bar
> foo bar baz
> EOF
foo bar

or pass the -z/--null-data option to activate the "multiline" mode (you'll need some extra workarounds to exactly match what you want):

$ grep -Poz '(?<=\n)?foo bar\n' << 'EOF'
> foo
> foo bar
> foo bar baz
> EOF
foo bar

3. Your first example doesn't do what you think

That last \s will match line 1 and line 3 instead of line 2 and line 4:

$ grep -P 'Patient\s\d+\s' << 'EOF'
> line1 Patient 123 45566
> line2 Patient 432
> line3 Patient 234 456
> line4 Patient 321
> line5
> EOF
line1 Patient 123 45566
line3 Patient 234 456
nxnev
  • 3,654
3

Use the -P switch with GNU grep for Perl regular expressions and your syntax will work as you have it.

$ grep -V | head -n1
grep (GNU grep) 2.25

$ grep --help | grep "\-P"
  -P, --perl-regexp       PATTERN is a Perl regular expression

Also, see this answer for more information.

clownbaby
  • 324
1

Not all regular expressions are using the same symbols, as others have already pointed out. If you are on a system where the default grep implementation is not GNU grep, then you have POSIX regular expressions, and these don't use Perl-like patterns like \s.

You seem to want to grep for lines ending with a single positive integer (as opposed to zero or more than one integer). Seeing your data, another way to formulate this is that you'd like to extract all lines with exactly three whitespace-delimited fields.

This is easy with awk:

$ awk 'NF == 3' test.log
line2 Patient 432
line4 Patient 321

NF is the number of fields (columns) in the current record (line), and with only a lone condition line this, the default action is to print all lines that fulfil the condition.

With grep, and with a more complete pattern that exactly specifies what we're expecting:

$ grep -Ex '[[:alnum:]]+ [[:alpha:]]+ [[:digit:]]+' test.log
line2 Patient 432
line4 Patient 321

The -E enables extended regular expressions (because we use the extended + modifier), and -x causes grep to match across a complete line.

[[:alnum:]]+ matches letters and numbers (according to your locale), while [[:alpha:]]+ and [[:digit:]]+ matches letters and strings of digits respectively.

Another way of writing the same thing that uses ASCII ranges (disregards your locale setting):

grep -Ex '[A-Za-z0-9]+ [A-Za-z]+ [0-9]+' test.log
Kusalananda
  • 333,661
1

The version of grep I was running on grep (BSD grep) 2.5.1-FreeBSD on my outdated MacOS does not support -P so I installed 3.3 with brew install grep --with-default-names and then I was able to get this to work with:

grep -P 'Patient\s\d+$' test.log