Replacing strings with regex in sed

Question

I'm trying to use sed to replace the output of a command with regexp but can't figure it out the problem.

I tested the regex in regex101.com and it seems fine grouping the things that I want. But I cant understand how sed work with regexp groups patterns.

Here is the command output:

appstream              CentOS Linux 8 - AppStream
baseos                 CentOS Linux 8 - BaseOS
epel                   Extra Packages for Enterprise Linux 8 - x86_64
epel-modular           Extra Packages for Enterprise Linux Modular 8 - x86_64
extras                 CentOS Linux 8 - Extras

And here is what I want to parse:

CentOS Linux 8 - AppStream
CentOS Linux 8 - BaseOS
Extra Packages for Enterprise Linux 8 - x86_64
Extra Packages for Enterprise Linux Modular 8 - x86_64
CentOS Linux 8 - Extras

The sed regexp that I came up to is this:

sed -E 's/"(^.*?\s)([A-Z|a-x].*)"/\2/g'

Can someone help me find the issue please?

Thanks!

The regex101 site does not even list POSIX regular expressions in its "Flavor" list. Don't use it to test regular expressions for use with Unix command line tools (only for use with the languages actually listed on the site). — Kusalananda, Mar 12 '21 at 08:31

steeldriver · Accepted Answer · 2021-03-12T13:06:18.813

There are a number of issues:

inside single quotes, double quotes are literal - since your command output doesn't contain ", it will never be matched
if your command output did have leading quotes, then a line anchor ^ could never match after such a character
you probably tested your regex in an engine that supported the Perl ? non-greedy modifier - in sed, ? is either literal (BRE) or a simple quantifier (ERE, as here with the -E flag) where it will cause .* to greedily match 0 or 1 times
\s matches only a single whitespace character; also like .*? it is strictly a Perl extension (although recent versions of GNU sed support it) - for portablilty, you might need to change to [[:blank:]]
| inside [...] doesn't represent alternation (this one wouldn't stop the expression from matching however, but it would also match a | character)

Assuming your sed implementation does support \s and its complement \S probably what you intended was something like

sed -E 's/^(\S+\s+)([A-Za-z].*)/\2/'

although you could more simply do

sed -E 's/\S+\s+(.*)/\1/'

or even just

sed -E 's/\S+\s+//'

to match a sequence of non-spaces followed by a sequence of spaces, and delete it. If your sed does not provide \s and \S` then you can do the same with POSIX character classes as

sed -E 's/[^[:blank:]]+[[:blank:]]+//'

or, if you are limited to a full POSIX sed (where + is not a quantifer regardless of mode)

sed 's/[^[:blank:]]\{1,\}[[:blank:]]\{1,\}//'

See also Why does my regular expression work in X but not in Y?

POSIX-ly: sed 's/[^[:blank:]]*[[:blank:]]*//' – Kusalananda Mar 12 '21 at 08:28 — Kusalananda, Mar 12 '21 at 08:28

score -1 · Answer 2 · answered Mar 12 '21 at 06:29

-1

Look for a run of spaces after a non white space and change it to a newline (as that surely will not be present). Then take away everything upto the newline. You have just deleted the first field.

sed -e 's/\S\s+/\n/;s/.*\n//' file

answered Mar 12 '21 at 06:29

guest_7

5,728
1
7
13

Replacing strings with regex in sed

2 Answers2