Using sed to find and replace complex string (preferrably with regex)

Question

I have a file with the following contents:

<username><![CDATA[name]]></username>
<password><![CDATA[password]]></password>
<dbname><![CDATA[name]]></dbname>

and I need to make a script that changes the "name" in the first line to "something", the "password" on the second line to "somethingelse", and the "name" in the third line to "somethingdifferent". I can't rely on the order of these occurring in the file, so I can't simply replace the first occurrence of "name" with "something" and the second occurrence of "name" with "somethingdifferent". I actually need to do a search for the surrounding strings to make sure I'm finding and replacing the correct thing.

So far I have tried this command to find and replace the first "name" occurrence:

sed -i "s/<username><![CDATA[name]]><\/username>/something/g" file.xml

however it's not working so I'm thinking some of these characters might need escaping, etc.

Ideally, I'd love to be able to use regex to just match the two "username" occurrences and replace only the "name". Something like this but with sed:

<username>.+?(name).+?</username>

and replace the contents in the brackets with "something".

Is this possible?

Just note that pretty much any regexp-based solution, unless extremely contrived, will risk breaking any time the input format changes. Regexps are a poor choice for dealing with XML, SGML or derivates (which this looks to me). — user, Jun 07 '13 at 21:57
Approved! Consider using XQuery for example: http://www.w3schools.com/xquery/default.asp. This is the W3C standard for retrieving and manipulating XML content. — lgeorget, Jun 07 '13 at 22:01

score 384 · Accepted Answer · edited Jan 12 '22 at 11:50

384

sed -i -E "s/(<username>.+)name(.+<\/username>)/\1something\2/" file.xml

This is, I think, what you're looking for.

Explanation:

parentheses in the first part define groups (strings in fact) that can be reused in the second part
\1, \2, etc. in the second part are references to the i-th group captured in the first part (the numbering starts with 1)
-E enables extended regular expressions (needed for + and grouping).
-i enables "in-place" file edit mode

edited Jan 12 '22 at 11:50

AdminBee

22,803

answered Jun 07 '13 at 21:52

lgeorget

13,914

This is probably not the most efficient way to do it, but when dealing with regexp the tradeoff is always the same: readability vs. efficience! :D – lgeorget Jun 07 '13 at 21:55
How to do this when the replace text starts with a number??sed starts assuming it to be group id rather than replace text – Navin Ilavarasan Jul 27 '15 at 06:14
@Navin you can escape a backslash with another backslash. "\2" is a backslash followed by number 2 whereas "\2" is group 2. – lgeorget Jul 27 '15 at 09:02
Thanks,I had tried that.The problem is that it is a variable and we don't really know whether this would start with a alphabet or a number. – Navin Ilavarasan Jul 27 '15 at 09:30
@Navin I think you should ask a new question (referencing this answer) with an example. You would quickly have an answer and it would be more visible to other users. – lgeorget Jul 27 '15 at 09:54
5

it leaves behind a backup file, with the name (original name) + "-E". – Display Name Nov 13 '15 at 05:32
^ seems like it's a glitch found only in OS X version of sed – Display Name Nov 13 '15 at 05:38
Try giving "-i" an explicit parameter. The default behaviour might be to save a backup version with the extension given as parameter. – lgeorget Nov 13 '15 at 06:57
9

On OSX i get 'sed: 1: "s/(.+)name(.+ ...": \1 not defined in the RE'. I pasted the exact example from this question into a file. then i ran the command from this answer on that file. Maybe OSX has different syntax? – Do Not Track Me Jan 21 '17 at 05:52
@deweyb That answer comes a bit late, sorry... The problem with sed (and other old utilities) is that several versions coexist which differ slightly by their defaults and flags. If my memory's correct, in some variants of sed, extended regexps are active by default and -E deactivates them. Confirm by reading the manual page for your system, my answer only works for GNU sed. :/ – lgeorget Feb 10 '17 at 14:54
3

The gnu version of sed supports the "-E" parameter, but not official. It's not even mentioned in the manpage. If you wanna use the extended regex, you have to use the "-r" parameter instead. – Ikem Krueger Sep 19 '17 at 17:39
1

@user82110 Actually "-E" is the POSIX standard and the GNU sed manual recommends using it for portability (since http://austingroupbugs.net/view.php?id=528). See https://www.gnu.org/software/sed/manual/sed.html. – lgeorget Sep 20 '17 at 07:15
6

@deweydb According to this answer, you should use $ and $ instead of ( and ). – Zhang Buzz Nov 12 '17 at 13:56
Why is -E not documented in the manual? Oh wait, -r is the new one. – neverMind9 Dec 04 '18 at 21:11
1

@neverMind9 Actually, -E is the new one and is recommended for portability (although this may very well not be a concern at all in your case). See http://austingroupbugs.net/view.php?id=528 and the sed info page https://www.gnu.org/software/sed/manual/sed.html. GNU sed has supported the -E option without documenting it for a long time as a synonym to "-r". – lgeorget Dec 04 '18 at 22:17
1

can someone provide an equaliavent full example that works on osx ? – mjs May 17 '19 at 13:39
7

On OSX -i requires an "extension" with which you can use an empty string: sed -E -i '' – GameSalutes Apr 12 '20 at 01:45
2

for anyone that has trouble with this like i did, i found a very useful tool that helps you see what your sed expression is doing, and will help provide error feedback:
https://sed.js.org/
– Slvrfn Nov 08 '22 at 12:19

score 27 · Answer 2 · answered Jun 07 '13 at 22:05

sed -e '/username/s/CDATA\[name\]/CDATA\[something\]/' \
-e '/password/s/CDATA\[password\]/CDATA\[somethingelse\]/' \
-e '/dbname/s/CDATA\[name\]/CDATA\[somethingdifferent\]/' file.txt

The /username/ before the s tells sed to only work on lines containing the string 'username'.

score 15 · Answer 3 · edited May 01 '17 at 00:10

If sed is not a hard requirement, better use a dedicated tool instead.

If your file is valid XML (not just those 3 XML-looking tags), then you can use XMLStarlet:

xml ed -P -O -L \
  -u '//username/text()' -v 'something' \
  -u '//password/text()' -v 'somethingelse' \
  -u '//dbname/text()' -v 'somethingdifferent' file.xml

The above will also work in situations which would be difficult to solve with regular expressions:

Can replace the values of the tags without specifying their current values.
Can replace the values even if they are just escaped and not enclosed in CDATA.
Can replace the values even if the tags have attributes.
Can easily replace just occurrences of tags, if there are multiple with the same name.
Can format the modified XML by indenting it.

Brief demonstration of the above:

bash-4.2$ cat file.xml
<sith>
<master>
<username><![CDATA[name]]></username>
</master>
<apprentice>
<username><![CDATA[name]]></username>
<password>password</password>
<dbname foo="bar"><![CDATA[name]]></dbname>
</apprentice>
</sith>

bash-4.2$ xml ed -O -u '//apprentice/username/text()' -v 'something' -u '//password/text()' -v 'somethingelse' -u '//dbname/text()' -v 'somethingdifferent' file.xml
<sith>
  <master>
    <username><![CDATA[name]]></username>
  </master>
  <apprentice>
    <username><![CDATA[something]]></username>
    <password>somethingelse</password>
    <dbname foo="bar"><![CDATA[somethingdifferent]]></dbname>
  </apprentice>
</sith>

score 6 · Answer 4 · edited Sep 18 '14 at 13:06

6

$ sed -e '1s/name/something/2' \
      -e '3s/name/somethingdifferent/2' \
      -e 's/password/somethingelse/2' sample.xml

You can simply use addresses as in the number preceding "s" which indicates the line number.

Also the number in the end tells sed to replace the second match instead of replacing the first match.

edited Sep 18 '14 at 13:06

slm

369,824

answered Sep 18 '14 at 12:52

A. Wench

91

score 5 · Answer 5 · answered Jun 08 '13 at 00:15

You need to quote \[.*^$/ in the regular expression part of the s command and \&/ in the replacement part, plus newlines. The regular expression is a basic regular expression, and in addition you need to quote the delimiter for the s command.

You can pick a different delimiter to avoid having to quote /. You'll have to quote that character instead, but usually the point of changing the delimiter is to pick one that doesn't occur in either the text to replace or the replacement text.

sed -e 's~<username><!\[CDATA\[name\]\]></username>~<username><![CDATA[something]]></username>~'

You can use groups to avoid repeating some parts in the replacement text, and accommodate variation on these parts.

sed -e 's~\(<username><!\[[A-Z]*\[\)name\(\]\]></username>\)~\1something\2~'

sed -e 's~\(<username>.*[^A-Za-z]\[\)name\([^A-Za-z].*</username>\)~\1something\2~'

score 1 · Answer 6 · answered Aug 01 '18 at 08:21

Usage: sed [OPTION]... {script-only-if-no-other-script} [input-file]...

    -r, --regexp-extended
             use extended regular expressions in the script.

so to replace value in a properties file

sed -i -r 's/MAIL\=(.+)/MAIL\=user@mymail.com/' etc/service.properties

slackmart · Answer 7 · 2013-06-07T22:01:03.477

1

For replace the "name" word with the "something" word, use:

sed "s/\(<username><\!\[[A-Z]*\[\)name\]/\1something/g" file.xml

That is going to replace all the occurrences of the specified word.

So far all is outputted to standard output, you can use:

sed "s/\(<username><\!\[[A-Z]*\[\)name\]/\1something/g" file.xml > anotherfile.xml

to save the changes to another file.

edited Jun 07 '13 at 22:01

answered Jun 07 '13 at 21:55

slackmart

329

Using sed to find and replace complex string (preferrably with regex)

7 Answers7