202

So pulling open a file with cat and then using grep to get matching lines only gets me so far when I am working with the particular log set that I am dealing with. It need a way to match lines to a pattern, but only to return the portion of the line after the match. The portion before and after the match will consistently vary. I have played with using sed or awk, but have not been able to figure out how to filter the line to either delete the part before the match, or just return the part after the match, either will work. This is an example of a line that I need to filter:

2011-11-07T05:37:43-08:00 <0.4> isi-udb5-ash4-1(id1) /boot/kernel.amd64/kernel: [gmp_info.c:1758](pid 40370="kt: gmp-drive-updat")(tid=100872) new group: <15,1773>: { 1:0-25,27-34,37-38, 2:0-33,35-36, 3:0-35, 4:0-9,11-14,16-32,34-38, 5:0-35, 6:0-15,17-36, 7:0-16,18-36, 8:0-14,16-32,34-36, 9:0-10,12-36, 10-11:0-35, 12:0-5,7-30,32-35, 13-19:0-35, 20:0,2-35, down: 8:15, soft_failed: 1:27, 8:15, stalled: 12:6,31, 20:1 }

The portion I need is everything after "stalled".

The background behind this is that I can find out how often something stalls:

cat messages | grep stalled | wc -l

What I need to do is find out how many times a certain node has stalled (indicated by the portion before each colon after "stalled". If I just grep for that (ie 20:) it may return lines that have soft fails, but no stalls, which doesn't help me. I need to filter only the stalled portion so I can then grep for a specific node out of those that have stalled.

For all intents and purposes, this is a freebsd system with standard GNU core utils, but I cannot install anything extra to assist.

Beryllium
  • 103
MaQleod
  • 2,614
  • @Gilles, Odd how that didn't pop up when I searched, though I didn't use the title I eventually went with...but it didn't show up in the screen below my title. Anyway, that aside, that might get me where I want, though I need the entire line after the match, not the first word - but might not take much of a change. – MaQleod Nov 07 '11 at 23:52
  • Its title sucked. I stole yours which is very nice. Take the sed solution and don't treat whitespace specially. – Gilles 'SO- stop being evil' Nov 07 '11 at 23:55
  • @Gilles, that is something I'm not entirely sure how to do. I am still learning sed. – MaQleod Nov 08 '11 at 00:06
  • similar to http://unix.stackexchange.com/questions/24089/returning-only-the-portion-of-a-line-after-a-matching-pattern/24091#24091 as well. – Tim Kennedy Nov 08 '11 at 00:43
  • @Gilles, You mentioned- "Note that if stalled: occurs several times on the line, this will match the last occurrence." Now what if it occurred multiple times?? I like to print stalled: 0 stalled: 9 stalled: 12 from following line. 2011-11-07T05:37:43-08:00 <0.4> stalled: 0 isi-udb5-ash4-1(id1) /boot/kernel.amd64/kernel: [gmp_info.c:1758](pid 40370="kt: gmp-drive-updat")(tid=100872) stalled: 9 new group: <15,1773>: { 1:0-25,27-34,37-38, 2:0-33,35-36, 3:0-35, 4:0-9,11-14,16-32,34-38, 5:0-35, 6:0-15,17-36, 7:0-16,18-36, 8:0-14,16-32,34-36, 9:0-10,12-36, 10-11:0-35, 12:0-5,7-30,32-35, 13-19:0-35 – shaa0601 Aug 28 '14 at 15:05
  • 1
    @shaa0601 I don't understand your question, it's especially difficult to follow in a comment with no formatting. Ask a new, self-contained question. – Gilles 'SO- stop being evil' Aug 28 '14 at 15:37

6 Answers6

258

The canonical tool for that would be sed.

sed -n -e 's/^.*stalled: //p'

Detailed explanation:

  • -n means not to print anything by default.
  • -e is followed by a sed command.
  • s is the pattern replacement command.
  • The regular expression ^.*stalled: matches the pattern you're looking for, plus any preceding text (.* meaning any text, with an initial ^ to say that the match begins at the beginning of the line). Note that if stalled: occurs several times on the line, this will match the last occurrence.
  • The match, i.e. everything on the line up to stalled: , is replaced by the empty string (i.e. deleted).
  • The final p means to print the transformed line.

If you want to retain the matching portion, use a backreference: \1 in the replacement part designates what is inside a group \(…\) in the pattern. Here, you could write stalled: again in the replacement part; this feature is useful when the pattern you're looking for is more general than a simple string.

sed -n -e 's/^.*\(stalled: \)/\1/p'

Sometimes you'll want to remove the portion of the line after the match. You can include it in the match by including .*$ at the end of the pattern (any text .* followed by the end of the line $). Unless you put that part in a group that you reference in the replacement text, the end of the line will not be in the output.

As a further illustration of groups and backreferences, this command swaps the part before the match and the part after the match.

sed -n -e 's/^\(.*\)\(stalled: \)\(.*\)$/\3\2\1/p'

To get the part after the first occurrence of the string instead of last (for those lines where the string can occur several times), a common trick is to replace that string once with a newline character (which is the one character that won't occur inside a line), and then remove everything up to that newline:

sed -n '
  /stalled: / {
    s//\
/
    s/.*\n//p
  }'

With some sed implementations, the first s command can be written s//\n/ though that's not standard/portable.

136

The other canonical tool you already use: grep:

For example:

grep -o 'stalled.*'

Has the same result as the second option of Gilles:

sed -n -e 's/^.*\(stalled: \)/\1/p'

The -o flag returns the --only-matching part of the expression, so not the entire line which is - of course - normally done by grep.

To remove the "stalled :" from the output, we can use a third canonical tool, cut:

grep -o 'stalled.*' | cut -f2- -d:

The cut command uses delimiter : and prints field 2 till the end. It's a matter of preference of course, but the cut syntax I find very easy to remember.

poige
  • 6,231
  • 3
    Thanks for mentioning the -o option! I wanted to point out that grep doesn't recognize the \n as a newline, so your first example only matches to the first n character. For example, echo "Hello Anne" | grep -o 'A[^\n]*' returns the string A. However, echo "Hello Anne" | grep -o 'A.*' returns the expected Anne, since . matches any character except the newline. – adamlamar Mar 16 '15 at 21:52
  • 2
    Note that the quotes around the cut delimiter -d':' are removed by @poige. I find it easier to remember with quotes, e.g. with -d' ' or -d';'. – Anne van Rossum Jul 10 '17 at 20:44
  • According to your finding it should be easier to remember to use quotes with -f 2 too. Seriously, why not? – poige Aug 26 '17 at 10:26
  • Because a delimiter like a semi-colon ; rather than a colon : will be interpreted differently if not quoted. Of course that's logical behavior, but still I like to rely on muscle memory. I don't like to quote the delimiter one time but not the other time. Just personal preference, like I said before: easier to remember. – Anne van Rossum Oct 07 '17 at 18:09
  • the period that is part of the .* is needed, worked well for me: cat filename | grep 'Return only this line xyz text' | grep -o 'xyz.*' returns xyz text – ron Dec 12 '17 at 19:01
  • Upvote, upvote, a million times upvote! This worked when sed didn't. – StatsSorceress Jun 03 '19 at 15:34
  • @AnnevanRossum hi from 2020 ) all those not really needed quotes are not only extra characters but additional headache when quoting also ;) – poige Jun 18 '20 at 13:50
14

Yet another canonical tool you considered awk could be used with the following line:

awk -F"stalled" '/stalled/{print $2}' messages

Detailed explanation:

  • -F defines a separator for the line, i.e., "stalled". Everything before the separator is addressed with $1 and everything after with $2.
  • /reg-ex/ Searches for the matching regular expression, in this case "stalled".
  • {print $<n>} - prints n column. Since your separator is defined as stalled, everything after stalled is considered to be the second column.
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
4

I used ifconfig | grep eth0 | cut -f3- -d: to take this

    [root@MyPC ~]# ifconfig
    eth0  Link encap:Ethernet  HWaddr AC:B4:CA:DD:E6:F8
          inet addr:192.168.0.2  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:78998810244 errors:1 dropped:0 overruns:0 frame:1
          TX packets:20113430261 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:110947036025418 (100.9 TiB)  TX bytes:15010653222322 (13.6 TiB)

and make it look like this

    [root@MyPC ~]# ifconfig | grep eth0 | cut -f3- -d:
    C4:7A:4D:F6:B8
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
2

Using Perl (i.e. Perl5) and Raku (previously known as Perl6):

Perl:

perl -pe 's/^.*stalled: //; #leaves non-matching and/or blank lines intact

Or:

perl -nE '/^.*stalled: (.*)/ and say $1;'  #removes non-matching lines

Raku:

raku -pe 's/^.*stalled\:\s//;' #leaves non-matching and/or blank lines intact

Or:

raku -ne '/^.*stalled\:\s (.*)/ and say ~$0;' #removes non-matching lines

OUTPUT (for 2nd Perl and 2nd Raku examples above):

12:6,31, 20:1 }

The code above is virtually identical between the two languages. The most significant difference is that in Raku all non-alnum/non-underscore characters must be escaped to be 'understood literally' by the Raku regex engine.

Other minor differences include the fact that:

  1. Raku changes capture numbering to start from $0 (Perl starts from $1),
  2. in Raku a leading ~ tilde is used to stringify the match object, and
  3. in Perl a -E commandline flag must be used to enable the say function.

http://www.wall.org/~larry/natural.html
https://www.perl.org/
https://www.raku.org/

jubilatious1
  • 3,195
  • 8
  • 17
  • 1
    Or perl -ne 'print if s/^.*stalled: //' or perl -ne 'print $& if /^.*stalled: \K.*/s' to preserve the original line delimiter if any. – Stéphane Chazelas Sep 15 '21 at 05:05
  • 1
    One advantage of using perl is that you can replace * with *? to get the part after the first occurrence of "stalled: " on the line. – Stéphane Chazelas Sep 15 '21 at 05:08
  • @StéphaneChazelas Very nice! AFAIK the Raku equivalents are raku -ne '.put if s/^.*stalled\:\s//' and raku -ne 'put ~$/ if /^.*stalled\:\s<(.*/;' . The second one can be written as: raku -ne 'put $/.Str if /^.* stalled\:\s <(.*)> /;', which may be a tad more readable. – jubilatious1 Sep 15 '21 at 05:32
  • @StéphaneChazelas agreed on the non-greedy *? notation, but if there are multiple instances of "stalled: " it might be fun to try a non-greedy match earlier in the regex, à la: perl -ne 'print ${^POSTMATCH} if /^.*?stalled: /ps' . – jubilatious1 Sep 15 '21 at 05:36
  • 1
    ITYM perl -ne 'print ${^POSTMATCH} if /stalled: /p' – Stéphane Chazelas Sep 15 '21 at 05:42
  • @StéphaneChazelas you are quite correct, perl -ne 'print ${^POSTMATCH} if /stalled: /p' works just fine on lines where multiple instances of "stalled: " exist. The Raku equivalent is: raku -ne 'put $/.postmatch if /stalled\:\s/;'. – jubilatious1 Sep 15 '21 at 05:54
0

there seems to a simpler way. just do:

sed "s/installed.*//g"

which removes all the words after "installed".

for i in *
do
    se=$(echo $i|sed "s/---.*//g")
    echo $se
    mv "$i" $se
done
jimmij
  • 47,140
  • Your solution removes the portion after the pattern. The OP needed to return the portion after the pattern, not remove it. – Stephane Oct 05 '21 at 12:31