grep not working as expected

Question

Given a file, "test.log" with the following contents:

line1 Patient 123 45566
line2 Patient 432
line3 Patient 234 456
line4 Patient 321
line5

I am trying to select line 2 and line 4 with this pattern:

grep "Patient\s\d+\s" test.log
# but this works testing at https://rubular.com/

Doesn't work nor does this:

grep "Patient\s\d+\n" test.log
# but this works testing at https://regexr.com/47qd5

What am I doing wrong?

Rublar does Ruby regular expressions. Regexr does JS REs or PCRE. grep does POSIX basic or extended regular expressions, or PCRE if you're using GNU grep and the -P option. What grep are you using (are you on Linux)? — Kusalananda, Feb 05 '19 at 23:40
Also, are there spaces after the numbers on line 2 and 4? Your first expression seem to want to match these... — Kusalananda, Feb 05 '19 at 23:49
grep -V; grep (BSD grep) 2.5.1-FreeBSD also tried with grep (GNU grep) 2.27 — lacostenycoder, Feb 05 '19 at 23:49
@Kusalananda it's a log file so there shouldn't be. my example is truncated — lacostenycoder, Feb 05 '19 at 23:51

nxnev · Answer 1 · 2019-02-06T02:54:23.880

1. Use either named classes or PCRE

GNU grep uses by default Basic Regular Expressions (BRE), but it also let you use Extended Regular Expressions (ERE) and Perl-compatible Regular Expressions (PCRE).

Please note that neither BRE nor ERE support \s nor \d, but they have similar features. From man grep:

Finally, certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]. For example, [[:alnum:]] means the character class of numbers and letters in the current locale. In the C locale and ASCII character set encoding, this is the same as [0-9A-Za-z]. (Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.) Most meta-characters lose their special meaning inside bracket expressions. To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last.

Example:

$ grep -E '^[[:digit:]]+$' << 'EOF'
> foo
> 123
> bar
> EOF
123

You may also use PCRE, as it supports \s and \d:

$ grep -P '^\d+$' << 'EOF'
> foo
> 123
> bar
> EOF
123

2. `\n` doesn't work

In Unix every \n delimits a line. grep prints lines that match a given pattern. Matching \n itself wouldn't make sense in this context.

You could either use $ to match the end of the line:

$ grep -E 'foo bar$' << 'EOF'
> foo
> foo bar
> foo bar baz
> EOF
foo bar

or pass the -z/--null-data option to activate the "multiline" mode (you'll need some extra workarounds to exactly match what you want):

$ grep -Poz '(?<=\n)?foo bar\n' << 'EOF'
> foo
> foo bar
> foo bar baz
> EOF
foo bar

3. Your first example doesn't do what you think

That last \s will match line 1 and line 3 instead of line 2 and line 4:

$ grep -P 'Patient\s\d+\s' << 'EOF'
> line1 Patient 123 45566
> line2 Patient 432
> line3 Patient 234 456
> line4 Patient 321
> line5
> EOF
line1 Patient 123 45566
line3 Patient 234 456

GNU grep's BREs and EREs do support \s (not \d yet AFAIK). Neither are POSIX BRE/ERE. — Stéphane Chazelas, Feb 06 '19 at 17:32
-z is to work on NUL-delimited records instead of NL-delimited records. It's a bit misleading to call it the multiline mode. pcregrep has a -M option for a real multiline mode. — Stéphane Chazelas, Feb 06 '19 at 17:36

score 3 · Answer 2 · answered Feb 06 '19 at 00:44

3

Use the -P switch with GNU grep for Perl regular expressions and your syntax will work as you have it.

$ grep -V | head -n1
grep (GNU grep) 2.25

$ grep --help | grep "\-P"
  -P, --perl-regexp       PATTERN is a Perl regular expression

Also, see this answer for more information.

answered Feb 06 '19 at 00:44

clownbaby

324

this doesn't work in my case – lacostenycoder Feb 06 '19 at 15:35
it works after I installed a newer version of grep which supports -P thanks – lacostenycoder Feb 06 '19 at 16:06

score 1 · Answer 3 · answered Feb 06 '19 at 06:52

Not all regular expressions are using the same symbols, as others have already pointed out. If you are on a system where the default grep implementation is not GNU grep, then you have POSIX regular expressions, and these don't use Perl-like patterns like \s.

You seem to want to grep for lines ending with a single positive integer (as opposed to zero or more than one integer). Seeing your data, another way to formulate this is that you'd like to extract all lines with exactly three whitespace-delimited fields.

This is easy with awk:

$ awk 'NF == 3' test.log
line2 Patient 432
line4 Patient 321

NF is the number of fields (columns) in the current record (line), and with only a lone condition line this, the default action is to print all lines that fulfil the condition.

With grep, and with a more complete pattern that exactly specifies what we're expecting:

$ grep -Ex '[[:alnum:]]+ [[:alpha:]]+ [[:digit:]]+' test.log
line2 Patient 432
line4 Patient 321

The -E enables extended regular expressions (because we use the extended + modifier), and -x causes grep to match across a complete line.

[[:alnum:]]+ matches letters and numbers (according to your locale), while [[:alpha:]]+ and [[:digit:]]+ matches letters and strings of digits respectively.

Another way of writing the same thing that uses ASCII ranges (disregards your locale setting):

grep -Ex '[A-Za-z0-9]+ [A-Za-z]+ [0-9]+' test.log

score 1 · Answer 4 · answered Feb 06 '19 at 16:03

1

The version of grep I was running on grep (BSD grep) 2.5.1-FreeBSD on my outdated MacOS does not support -P so I installed 3.3 with brew install grep --with-default-names and then I was able to get this to work with:

grep -P 'Patient\s\d+$' test.log

answered Feb 06 '19 at 16:03

lacostenycoder

594

grep not working as expected

4 Answers4

1. Use either named classes or PCRE

2. `\n` doesn't work

3. Your first example doesn't do what you think

Linked

Related

grep not working as expected

4 Answers4

1. Use either named classes or PCRE

2. \n doesn't work

3. Your first example doesn't do what you think

Linked

Related

2. `\n` doesn't work