Return only the portion of a line after a matching pattern

Question

So pulling open a file with cat and then using grep to get matching lines only gets me so far when I am working with the particular log set that I am dealing with. It need a way to match lines to a pattern, but only to return the portion of the line after the match. The portion before and after the match will consistently vary. I have played with using sed or awk, but have not been able to figure out how to filter the line to either delete the part before the match, or just return the part after the match, either will work. This is an example of a line that I need to filter:

2011-11-07T05:37:43-08:00 <0.4> isi-udb5-ash4-1(id1) /boot/kernel.amd64/kernel: [gmp_info.c:1758](pid 40370="kt: gmp-drive-updat")(tid=100872) new group: <15,1773>: { 1:0-25,27-34,37-38, 2:0-33,35-36, 3:0-35, 4:0-9,11-14,16-32,34-38, 5:0-35, 6:0-15,17-36, 7:0-16,18-36, 8:0-14,16-32,34-36, 9:0-10,12-36, 10-11:0-35, 12:0-5,7-30,32-35, 13-19:0-35, 20:0,2-35, down: 8:15, soft_failed: 1:27, 8:15, stalled: 12:6,31, 20:1 }

The portion I need is everything after "stalled".

The background behind this is that I can find out how often something stalls:

cat messages | grep stalled | wc -l

What I need to do is find out how many times a certain node has stalled (indicated by the portion before each colon after "stalled". If I just grep for that (ie 20:) it may return lines that have soft fails, but no stalls, which doesn't help me. I need to filter only the stalled portion so I can then grep for a specific node out of those that have stalled.

For all intents and purposes, this is a freebsd system with standard GNU core utils, but I cannot install anything extra to assist.

@Gilles, Odd how that didn't pop up when I searched, though I didn't use the title I eventually went with...but it didn't show up in the screen below my title. Anyway, that aside, that might get me where I want, though I need the entire line after the match, not the first word - but might not take much of a change. — MaQleod, Nov 07 '11 at 23:52
Its title sucked. I stole yours which is very nice. Take the sed solution and don't treat whitespace specially. — Gilles 'SO- stop being evil', Nov 07 '11 at 23:55
@Gilles, that is something I'm not entirely sure how to do. I am still learning sed. — MaQleod, Nov 08 '11 at 00:06
similar to http://unix.stackexchange.com/questions/24089/returning-only-the-portion-of-a-line-after-a-matching-pattern/24091#24091 as well. — Tim Kennedy, Nov 08 '11 at 00:43
@Gilles, You mentioned- "Note that if stalled: occurs several times on the line, this will match the last occurrence." Now what if it occurred multiple times?? I like to print stalled: 0 stalled: 9 stalled: 12 from following line. 2011-11-07T05:37:43-08:00 <0.4> stalled: 0 isi-udb5-ash4-1(id1) /boot/kernel.amd64/kernel: [gmp_info.c:1758](pid 40370="kt: gmp-drive-updat")(tid=100872) stalled: 9 new group: <15,1773>: { 1:0-25,27-34,37-38, 2:0-33,35-36, 3:0-35, 4:0-9,11-14,16-32,34-38, 5:0-35, 6:0-15,17-36, 7:0-16,18-36, 8:0-14,16-32,34-36, 9:0-10,12-36, 10-11:0-35, 12:0-5,7-30,32-35, 13-19:0-35 — shaa0601, Aug 28 '14 at 15:05
@shaa0601 I don't understand your question, it's especially difficult to follow in a comment with no formatting. Ask a new, self-contained question. — Gilles 'SO- stop being evil', Aug 28 '14 at 15:37

score 258 · Accepted Answer · edited Sep 14 '21 at 05:50

258

The canonical tool for that would be sed.

sed -n -e 's/^.*stalled: //p'

Detailed explanation:

-n means not to print anything by default.
-e is followed by a sed command.
s is the pattern replacement command.
The regular expression ^.*stalled: matches the pattern you're looking for, plus any preceding text (.* meaning any text, with an initial ^ to say that the match begins at the beginning of the line). Note that if stalled: occurs several times on the line, this will match the last occurrence.
The match, i.e. everything on the line up to stalled: , is replaced by the empty string (i.e. deleted).
The final p means to print the transformed line.

If you want to retain the matching portion, use a backreference: \1 in the replacement part designates what is inside a group $…$ in the pattern. Here, you could write stalled: again in the replacement part; this feature is useful when the pattern you're looking for is more general than a simple string.

sed -n -e 's/^.*\(stalled: \)/\1/p'

Sometimes you'll want to remove the portion of the line after the match. You can include it in the match by including .*$ at the end of the pattern (any text .* followed by the end of the line $). Unless you put that part in a group that you reference in the replacement text, the end of the line will not be in the output.

As a further illustration of groups and backreferences, this command swaps the part before the match and the part after the match.

sed -n -e 's/^\(.*\)\(stalled: \)\(.*\)$/\3\2\1/p'

To get the part after the first occurrence of the string instead of last (for those lines where the string can occur several times), a common trick is to replace that string once with a newline character (which is the one character that won't occur inside a line), and then remove everything up to that newline:

sed -n '
  /stalled: / {
    s//\
/
    s/.*\n//p
  }'

With some sed implementations, the first s command can be written s//\n/ though that's not standard/portable.

edited Sep 14 '21 at 05:50

Stéphane Chazelas

544,893

answered Nov 08 '11 at 00:22

Gilles 'SO- stop being evil'

829,060

I've tried the first two examples and it just seems to hang. I don't get an error message, nor do I get a new prompt, just nothing. – MaQleod Nov 08 '11 at 01:00
3

@MaQleod Oh, it's waiting for input on standard input, which here is the terminal because you haven't redirected it. Here you'd do an input redirection sed … <messages, since you want to process data from a file. To act on data produced by another command, you'd use a pipe: somecommand | sed …. – Gilles 'SO- stop being evil' Nov 08 '11 at 01:02
2

right, end of day blackout there. command works perfectly, thanks. – MaQleod Nov 08 '11 at 16:37
1

Best sed explanation I've seen so far -- thanks! – Jon Wadsworth Sep 16 '16 at 17:47
shorter version: sed -r 's/(^.*stalled)//' – ungalcrys Aug 09 '17 at 10:01
1

@ungalcrys Shorter version of what? This isn't equivalent to any of the commands in my answer. I'd recommend writing it as sed 's/^.*stalled//' since -r is specific to Linux and doesn't work on other systems such as macOS and here you aren't getting any benefit from it. – Gilles 'SO- stop being evil' Aug 09 '17 at 10:19
@Gilles seemed to be the same thing like your first answer but on Linux. Thanks for the clarifications. – ungalcrys Aug 09 '17 at 10:31
1

@ungalcrys The difference is their behavior on non-matching lines. sed 's/^.*stalled//' prints them unchanged, my first command skips them. This may not matter for this particular question. – Gilles 'SO- stop being evil' Aug 09 '17 at 10:37
seems to be skipping first line – the_prole Jun 20 '18 at 22:23
@the_prole None of the code snippets in my answer treat the first line differently from the others. – Gilles 'SO- stop being evil' Jun 21 '18 at 06:02
Alternative with a printing sed (i.e: no -n option used): sed '/^.*stalled: /!d;s///' – Sep 20 '21 at 01:16

score 136 · Answer 2 · edited Aug 26 '16 at 01:50

136

The other canonical tool you already use: grep:

For example:

grep -o 'stalled.*'

Has the same result as the second option of Gilles:

sed -n -e 's/^.*\(stalled: \)/\1/p'

The -o flag returns the --only-matching part of the expression, so not the entire line which is - of course - normally done by grep.

To remove the "stalled :" from the output, we can use a third canonical tool, cut:

grep -o 'stalled.*' | cut -f2- -d:

The cut command uses delimiter : and prints field 2 till the end. It's a matter of preference of course, but the cut syntax I find very easy to remember.

edited Aug 26 '16 at 01:50

poige

6,231

answered Aug 08 '14 at 11:46

Anne van Rossum

1,750

3

Thanks for mentioning the -o option! I wanted to point out that grep doesn't recognize the \n as a newline, so your first example only matches to the first n character. For example, echo "Hello Anne" | grep -o 'A[^\n]*' returns the string A. However, echo "Hello Anne" | grep -o 'A.*' returns the expected Anne, since . matches any character except the newline. – adamlamar Mar 16 '15 at 21:52
2

Note that the quotes around the cut delimiter -d':' are removed by @poige. I find it easier to remember with quotes, e.g. with -d' ' or -d';'. – Anne van Rossum Jul 10 '17 at 20:44
According to your finding it should be easier to remember to use quotes with -f 2 too. Seriously, why not? – poige Aug 26 '17 at 10:26
Because a delimiter like a semi-colon ; rather than a colon : will be interpreted differently if not quoted. Of course that's logical behavior, but still I like to rely on muscle memory. I don't like to quote the delimiter one time but not the other time. Just personal preference, like I said before: easier to remember. – Anne van Rossum Oct 07 '17 at 18:09
the period that is part of the .* is needed, worked well for me: cat filename | grep 'Return only this line xyz text' | grep -o 'xyz.*' returns xyz text – ron Dec 12 '17 at 19:01
Upvote, upvote, a million times upvote! This worked when sed didn't. – StatsSorceress Jun 03 '19 at 15:34
@AnnevanRossum hi from 2020 ) all those not really needed quotes are not only extra characters but additional headache when quoting also ;) – poige Jun 18 '20 at 13:50

score 14 · Answer 3 · edited Apr 05 '19 at 23:13

14

Yet another canonical tool you considered awk could be used with the following line:

awk -F"stalled" '/stalled/{print $2}' messages

Detailed explanation:

-F defines a separator for the line, i.e., "stalled". Everything before the separator is addressed with $1 and everything after with $2.
/reg-ex/ Searches for the matching regular expression, in this case "stalled".
{print $<n>} - prints n column. Since your separator is defined as stalled, everything after stalled is considered to be the second column.

edited Apr 05 '19 at 23:13

Rui F Ribeiro

56,709
26
150
232

answered Apr 03 '19 at 09:23

robertm.tum

261

think for the awk solution - this solution works the best for me – B.Kocis Nov 19 '21 at 09:42

score 4 · Answer 4 · edited Apr 05 '19 at 22:24

4

I used ifconfig | grep eth0 | cut -f3- -d: to take this

    [root@MyPC ~]# ifconfig
    eth0  Link encap:Ethernet  HWaddr AC:B4:CA:DD:E6:F8
          inet addr:192.168.0.2  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:78998810244 errors:1 dropped:0 overruns:0 frame:1
          TX packets:20113430261 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:110947036025418 (100.9 TiB)  TX bytes:15010653222322 (13.6 TiB)

and make it look like this

    [root@MyPC ~]# ifconfig | grep eth0 | cut -f3- -d:
    C4:7A:4D:F6:B8

edited Apr 05 '19 at 22:24

Rui F Ribeiro

56,709
26
150
232

answered Mar 31 '17 at 04:52

Luis Perez

49

3

Does this answer the question? – Stephen Rauch Mar 31 '17 at 04:56
2

You can use cat /sys/class/net/*/address, no parsing required. – Anne van Rossum Dec 13 '17 at 16:58
2

if only C4:7A:4D:F6:B8 appeared in your initial code block – Zodzie Sep 02 '20 at 21:24

jubilatious1 · Answer 5 · 2021-09-15T05:05:50.623

2

Using Perl (i.e. Perl5) and Raku (previously known as Perl6):

Perl:

perl -pe 's/^.*stalled: //; #leaves non-matching and/or blank lines intact

Or:

perl -nE '/^.*stalled: (.*)/ and say $1;'  #removes non-matching lines

Raku:

raku -pe 's/^.*stalled\:\s//;' #leaves non-matching and/or blank lines intact

Or:

raku -ne '/^.*stalled\:\s (.*)/ and say ~$0;' #removes non-matching lines

OUTPUT (for 2nd Perl and 2nd Raku examples above):

12:6,31, 20:1 }

The code above is virtually identical between the two languages. The most significant difference is that in Raku all non-alnum/non-underscore characters must be escaped to be 'understood literally' by the Raku regex engine.

Other minor differences include the fact that:

Raku changes capture numbering to start from $0 (Perl starts from $1),
in Raku a leading ~ tilde is used to stringify the match object, and
in Perl a -E commandline flag must be used to enable the say function.

http://www.wall.org/~larry/natural.html
https://www.perl.org/
https://www.raku.org/

edited Sep 15 '21 at 05:05

answered Sep 14 '21 at 03:55

jubilatious1

3,195
8
17

1

Or perl -ne 'print if s/^.*stalled: //' or perl -ne 'print $& if /^.*stalled: \K.*/s' to preserve the original line delimiter if any. – Stéphane Chazelas Sep 15 '21 at 05:05
1

One advantage of using perl is that you can replace * with *? to get the part after the first occurrence of "stalled: " on the line. – Stéphane Chazelas Sep 15 '21 at 05:08
@StéphaneChazelas Very nice! AFAIK the Raku equivalents are raku -ne '.put if s/^.*stalled\:\s//' and raku -ne 'put ~$/ if /^.*stalled\:\s<(.*/;' . The second one can be written as: raku -ne 'put $/.Str if /^.* stalled\:\s <(.*)> /;', which may be a tad more readable. – jubilatious1 Sep 15 '21 at 05:32
@StéphaneChazelas agreed on the non-greedy *? notation, but if there are multiple instances of "stalled: " it might be fun to try a non-greedy match earlier in the regex, à la: perl -ne 'print ${^POSTMATCH} if /^.*?stalled: /ps' . – jubilatious1 Sep 15 '21 at 05:36
1

ITYM perl -ne 'print ${^POSTMATCH} if /stalled: /p' – Stéphane Chazelas Sep 15 '21 at 05:42
@StéphaneChazelas you are quite correct, perl -ne 'print ${^POSTMATCH} if /stalled: /p' works just fine on lines where multiple instances of "stalled: " exist. The Raku equivalent is: raku -ne 'put $/.postmatch if /stalled\:\s/;'. – jubilatious1 Sep 15 '21 at 05:54

score 0 · Answer 6 · edited Dec 21 '19 at 01:43

0

there seems to a simpler way. just do:

sed "s/installed.*//g"

which removes all the words after "installed".

for i in *
do
    se=$(echo $i|sed "s/---.*//g")
    echo $se
    mv "$i" $se
done

edited Dec 21 '19 at 01:43

jimmij

47,140

answered Dec 21 '19 at 00:54

minor hash

11

Your solution removes the portion after the pattern. The OP needed to return the portion after the pattern, not remove it. – Stephane Oct 05 '21 at 12:31

Return only the portion of a line after a matching pattern

6 Answers6

Linked

Related