0

I'm trying to use sed to replace the output of a command with regexp but can't figure it out the problem.

I tested the regex in regex101.com and it seems fine grouping the things that I want. But I cant understand how sed work with regexp groups patterns.

Here is the command output:

appstream              CentOS Linux 8 - AppStream
baseos                 CentOS Linux 8 - BaseOS
epel                   Extra Packages for Enterprise Linux 8 - x86_64
epel-modular           Extra Packages for Enterprise Linux Modular 8 - x86_64
extras                 CentOS Linux 8 - Extras

And here is what I want to parse:

CentOS Linux 8 - AppStream
CentOS Linux 8 - BaseOS
Extra Packages for Enterprise Linux 8 - x86_64
Extra Packages for Enterprise Linux Modular 8 - x86_64
CentOS Linux 8 - Extras

The sed regexp that I came up to is this:

sed -E 's/"(^.*?\s)([A-Z|a-x].*)"/\2/g'

Can someone help me find the issue please?

Thanks!

αғsнιη
  • 41,407
kegham
  • 3
  • 2
  • 1
    The regex101 site does not even list POSIX regular expressions in its "Flavor" list. Don't use it to test regular expressions for use with Unix command line tools (only for use with the languages actually listed on the site). – Kusalananda Mar 12 '21 at 08:31

2 Answers2

2

There are a number of issues:

  1. inside single quotes, double quotes are literal - since your command output doesn't contain ", it will never be matched

  2. if your command output did have leading quotes, then a line anchor ^ could never match after such a character

  3. you probably tested your regex in an engine that supported the Perl ? non-greedy modifier - in sed, ? is either literal (BRE) or a simple quantifier (ERE, as here with the -E flag) where it will cause .* to greedily match 0 or 1 times

  4. \s matches only a single whitespace character; also like .*? it is strictly a Perl extension (although recent versions of GNU sed support it) - for portablilty, you might need to change to [[:blank:]]

  5. | inside [...] doesn't represent alternation (this one wouldn't stop the expression from matching however, but it would also match a | character)

Assuming your sed implementation does support \s and its complement \S probably what you intended was something like

sed -E 's/^(\S+\s+)([A-Za-z].*)/\2/'

although you could more simply do

sed -E 's/\S+\s+(.*)/\1/'

or even just

sed -E 's/\S+\s+//'

to match a sequence of non-spaces followed by a sequence of spaces, and delete it. If your sed does not provide \s and \S` then you can do the same with POSIX character classes as

sed -E 's/[^[:blank:]]+[[:blank:]]+//'

or, if you are limited to a full POSIX sed (where + is not a quantifer regardless of mode)

sed 's/[^[:blank:]]\{1,\}[[:blank:]]\{1,\}//'

See also Why does my regular expression work in X but not in Y?

steeldriver
  • 81,074
-1

Look for a run of spaces after a non white space and change it to a newline (as that surely will not be present). Then take away everything upto the newline. You have just deleted the first field.

sed -e 's/\S\s+/\n/;s/.*\n//' file
guest_7
  • 5,728
  • 1
  • 7
  • 13