How to get group results using grep?

Question

How would I get this output:

Found value: This order was placed for QT3000! OK?

or

Found value: This order was placed for QT300

or

Found value: 0

using line.txt and pattern.txt as below:

[nsaunders@rolly regex]$ 
[nsaunders@rolly regex]$ grep -e -f pattern.txt line.txt 
[nsaunders@rolly regex]$ 
[nsaunders@rolly regex]$ cat pattern.txt 
(.*)(\\d+)(.*)
[nsaunders@rolly regex]$ 
[nsaunders@rolly regex]$ cat line.txt 
This order was placed for QT3000! OK?
[nsaunders@rolly regex]$

utilizing something similar to m.group(0) from a tutorial on regex.

Perhaps grep doesn't have such notion as:

Groups and capturing
Group number
Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups:
1       ((A)(B(C)))
2       (A)
3       (B(C))
4       (C)


Group zero always stands for the entire expression.
Capturing groups are so named because, during a match, each subsequence of the input sequence that matches such a group is saved. The captured subsequence may be used later in the expression, via a back reference, and may also be retrieved from the matcher once the match operation is complete.

I'm using the -e switch @Sundeep, but is that not sufficient? Perhaps you would elaborate a bit, and thanks for the link. — Nicholas Saunders, Jun 14 '20 at 06:51
could you explain how does one line This order was placed for QT3000! OK? translates to three lines of output? you need -E switch for () to act as capture groups.. \d is not supported by grep (unless you have GNU grep which has PCRE support) — Sundeep, Jun 14 '20 at 06:52
oh, pardon, what I mean is generate each of those three lines using some notion of group(x) with grep. thanks, I updated the question. I'm not sure how \d factors in here. but, yes, I'm asking about capture groups. Hmm, I'm looking into -e versus -E now, thanks... — Nicholas Saunders, Jun 14 '20 at 06:54
do you want to print all lines containing a digit character? grep '[0-9]' line.txt ? — Sundeep, Jun 14 '20 at 06:57
if you want only the digits, grep -oE '[0-9]+' (provided you grep supports -o option) — Sundeep, Jun 14 '20 at 06:58

Kusalananda · Accepted Answer · 2020-06-14T09:13:56.653

Assuming that the pattern in pattern.txt is

(.*)(\d+)(.*)

then, using it with GNU grep would be a matter of

grep -E -f pattern.txt line.txt

i.e., search in line.txt for lines matching any of the extended regular expressions listed in pattern.txt, which, given the data in the question, produces

This order was placed for QT3000! OK?

The issue with your command was that you used -e -f. The -e option is used for explicitly saying "the next argument is the expression". This means that -e -f will be interpreted as "the regular expression to use is -f". You then applied this in searching for matches in both the files mentioned on the command line.

A secondary issue was the \\d in the pattern.txt file, which matches a backslash followed by the character d, i.e. the literal string \d.

The pattern has a few other "issues". It first of all uses a non-standard expression to match a digit, \d. This is better written as [[:digit:]] or as the range [0-9] (in the POSIX standard locale). Since regular expressions matches on substrings, as opposed to filename globbing patterns which are always automatically anchored, neither of the .* bits of the pattern is needed. Likewise, the parentheses are not needed at all as they serve no function in the pattern. The + isn't needed either as a single digit would be matched by the preceding expression (a single digit is "one or more digits").

This means that to extract all lines that contains (at least) one digit, you may instead use the pattern [[:digit:]] or [0-9], or \d if you want to keep using Perl-like expressions with GNU grep, with no other decorations. For the difference between these, please see Difference between [0-9], [[:digit:]] and \d.

To get the three different outputs that you show in the question, use sed rather than grep. You want to use sed because grep can only print matching lines (or words), but not really modify the data matched.

Insert Found value: in front of any line containing a digit, and print those lines:

$ sed -n '/[[:digit:]]/s/^/Found value: /p' line.txt
Found value: This order was placed for QT3000! OK?

Insert Found value: in front of any line containing a digit, and print those lines up to the end of the 3rd digit found (or to at most the 3rd digit; may output fewer digits at the end if there are fewer consecutive digits in the first substring of digits on the line):
```
$ sed -n '/[[:digit:]]/s/$[^[:digit:]]*[[:digit:]]\{1,3\}$.*/Found value: \1/p' line.txt
Found value: This order was placed for QT300
```
Insert Found value: in front of any line containing a digit, and print the last digit from the line:
```
$ sed -n '/[[:digit:]]/s/.*$[[:digit:]]$.*/Found value: \1/p' line.txt
Found value: 0
```

Using an equivalent regular expression as you used, we can see what bits of the text it matches:

$ sed 's/\(.*\)\([[:digit:]]\{1,\}\)\(.*\)/(\1)(\2)(\3)/' line.txt
(This order was placed for QT300)(0)(! OK?)

Note that \2 only matches the last digit on the line as the preceding .* is greedy.

score 1 · Answer 2 · answered Jun 14 '20 at 09:33

I don't think you can access the capture groups in grep, but you can in Perl.

$ echo 'foo123bar' | re='(.*)(\d+)(.*)' \
     perl -lne 'if (m/$ENV{re}/) { printf "Found value: %s\n", $_ for @{^CAPTURE} }'
Found value: foo12
Found value: 3
Found value: bar

It basically just says to match the input lines (read implicitly by -n) against what's in the env variable re (set on the command line to pass the pattern), and to print all the captured texts (from the array @{^CAPTURE}) with the prefix if there was a match.

score 0 · Answer 3 · answered Jun 15 '20 at 02:18

From Java:

jshell> /reset
|  Resetting state.
jshell> /open grep.jsh
This order was placed for QT3000! OK?
This order was placed for QT300
0
jshell> /list
1 : import static java.lang.System.out;
   2 : String text = "This order was placed for QT3000! OK?";
   3 : String patternString1 = "(.)(\d+)(.)";
   4 : Pattern pattern = Pattern.compile(patternString1);
   5 : Matcher matcher = pattern.matcher(text);
   6 : matcher.find()
   7 : out.println(matcher.group(0))
   8 : out.println(matcher.group(1))
   9 : out.println(matcher.group(2))
jshell>

I was more interested in using group but didn't realize that grep doesn't really do that.

How to get group results using grep?

3 Answers3