Is there a way to prevent sed from interpreting the replacement string?

Question

If you want to replace a keyword with a string using sed, sed tries hard to interpret your replacement string. If the replacement string happens to have characters that sed considers special, like a / character, it will fail, unless of course you meant your replacement string to have characters that tell sed how to act.

Ex:

VAR="hi/"
sed "s/KEYWORD/$VAR/g" somefile

Is there any way to tell sed not to try to interpret the replacement string for special characters? All I want is to be able to replace a keyword in a file with the contents of a variable, no matter what that content is.

If you want to put special characters into sed and have them not be special, just backslash escape them. VAR='hi\/' gives no such problem. — Wildcard, Jan 17 '16 at 09:24
sed(1) just interprets what it gets. In your case, it gets that via a shell interpolation. I believe you can't do as you want, but check the manual. I know in Perl (which makes a passable sed replacement, with much richer regular expressions) you can specify a string is to be taken literally, again, check the manual. — vonbrand, Jan 23 '16 at 13:57
related https://stackoverflow.com/questions/407523/escape-a-string-for-a-sed-replace-pattern — Ciro Santilli OurBigBook.com, Mar 28 '18 at 12:42
Also How to search & replace arbitrary literal strings in sed and awk (and perl). — G-Man Says 'Reinstate Monica', Oct 22 '21 at 05:48

glenn jackman · Answer 1 · 2016-01-17T12:45:39.200

9

There are only 4 special characters in the replacement part: \, &, newline and the delimiter (ref)

$ VAR='abc/def&ghi\foo
next line'

$ repl=$(sed -e 's/[&\\/]/\\&/g; s/$/\\/' -e '$s/\\$//' <<<"$VAR")

$ echo "$repl"
abc\/def\&ghi\\foo\
next line

$ echo ZYX | sed "s/Y/$repl/g"
Zabc/def&ghi\foo
next lineX

edited Jan 17 '16 at 12:45

answered Jan 17 '16 at 12:30

glenn jackman

85,964

2

This has the same problem as Antti's solution - if the replacement string is past a certain length, you get a "Argument list too long" error. Also, what if the replacement string has '[', ']', '*', '.', and other such characters? Would sed really not interpret those? – Tal Jan 17 '16 at 17:26
The replacement side of s/// is not a regular expression, it's really just a string (except for backslash-escapes and &). If the replacement string is so long, a shell one-liner is not your solution. – glenn jackman Jan 17 '16 at 20:11
A very useful list if, for example, your replacement string is base64 encoded text (eg. replacing a placeholder with a SHA256 key). Then it's just the delimiter to worry about. – Heath Raftery May 16 '19 at 14:03
Only 4 chars special to sed? Square brackets break it too. – markling May 10 '22 at 13:30
Can the second -e be replaced by use of a ; to separate the last sed command? – Robin A. Meade Oct 25 '22 at 18:13
1

Yes. I can't remember why I did that. Possibly just to separate the "do this on every line" code from the "do this only on the last line" code. – glenn jackman Oct 25 '22 at 18:28

Antti Haapala · Answer 2 · 2016-01-17T19:04:13.503

8

You can use Perl instead of sed with -p (assume loop over input) and -e (give program on command line). With Perl you can access environment variables without interpolating these in shell. Note that the variable needs to be exported:

export VAR='hi/'
perl -p -e 's/KEYWORD/$ENV{VAR}/g' somefile

If you do not want to export the variable everywhere, then just provide it for that process only:

PATTERN="$VAR" perl -p -e 's/KEYWORD/$ENV{PATTERN}/g' somefile

Do note, that Perl's regular expression syntax is by default slightly different from sed's.

edited Jan 17 '16 at 19:04

answered Jan 17 '16 at 10:28

Antti Haapala

1,151

This seemed very promising, but when testing it, I get a "Argument list too long" error because my replacement string is too long, which makes sense - using this method, we are using the entire replacement string as part of the arguments we give to perl, so there is a limit on how long it can be. – Tal Jan 17 '16 at 17:16
1

No, it will go in the PATTERN environment variable, not arguments. In any case, this error would be E2BIG, which you would equally get if you used sed. – Antti Haapala Jan 17 '16 at 19:03
Didn't work for me... first it truncated the file. Then I realised your example is incomplete. I stiched in the OP and got VAR="hi"; PATTERN="$VAR" perl -p -e 's/KEYWORD/$ENV{PATTERN}/g' file.txt but this doesn't replace the value in the file, it just outputs a copy of the file contents with the replacement done. – geoidesic Apr 23 '22 at 19:12
@geoidesic you can use -i to do in-place modification. The sed command doesn't do in-place either. – Antti Haapala Apr 23 '22 at 20:43

Wildcard · Answer 3 · 2022-06-03T17:53:06.817

3

The very simplest solution which would still handle the vast majority of variable values correctly, would be to use a non-printing character as a delimiter to sed's substitute command.

In vi and in many shells you can escape any control character by typing Ctrl-V (more commonly written as ^V). So if you use some control character (I often use ^A as a delimiter in these cases) then your sed command will only break if that nonprinting character is present in the variable you're dropping in.

So you would type "s^V^AKEYWORD^V^A$VAR^V^Ag" and what you would get (in vi or your shell) would look like:

sed "s^AKEYWORD^A$VAR^Ag" somefile

(You can't copy and paste this from this answer. You have to actually type it as described.)

This will work as long as $VAR doesn't contain the non-printing character ^A—which is exceedingly unlikely.

Of course, if you're passing user input into the value of $VAR, then all bets are off and you'd better sanitize your input thoroughly rather than relying on control characters being hard to type for the average user.

There is actually more to beware of than the delimiter string, though. For instance, &, when present in a replacement string, means "the entire text that was matched." E.g., s/stu../my&/ would replace "stuff" with "mystuff", "stung" with "mystung", etc. So if you might have any character in the variable that you're dropping in as a replacement string, but you want to use the literal value of the variable only, then you have some data sanitizing to do before you can use the variable as a replacement string in sed. (The data sanitizing can be done with sed also, though.)

edited Jun 03 '22 at 17:53

answered Jan 17 '16 at 07:29

Wildcard

36,499

1

That's kind of my point - replacing a string with another string is a very simple operation. Does it really need to be as complicated as figuring out which characters sed won't like, and using sed to sanitize its own input? That sounds ridiculously and unnecessarily convoluted. I'm not a professional programmer, but I'm pretty sure I can code a small function that replaces a keyword with a string in pretty much any language I've ever come across, including bash - I was just hoping for a simple Linux solution using existing tools - I can't believe there isn't one out there. – Tal Jan 17 '16 at 08:28
1

@Tal, if your replacement string is "100s of pages long" as you mention in another comment...you can hardly call it a "simple" use case. The answer here is Perl, by the way—I just haven't learned Perl. The complexity here comes from the fact that you want to allow ANY arbitrary input as a replacement string in a regex. – Wildcard Jan 17 '16 at 08:57
There are numerous other solutions you could use, many of them very simple. For instance, if your replacement string is actually line based and doesn't need to be inserted in the middle of a line, use sed's insert command. But sed is not a good tool for processing vast amounts of text in complex ways. I'll post another answer showing how to do this with awk. – Wildcard Jan 17 '16 at 08:59
I've got sed: -e expression #1, char 14: unknown option tos'` – t7e Jun 02 '22 at 12:35
@t7e you're not supposed to press shift-6. The carat (^) is a symbol for pressing the control key. – Wildcard Jun 02 '22 at 16:42
@Wildcard I just copied your example and ran it on Ubuntu bash and got this error. I don't quite understand how it should work. – t7e Jun 03 '22 at 11:20
@t7e you can't copy and paste a non-printable character. – Wildcard Jun 03 '22 at 17:51

user3566929 · Answer 4 · 2016-01-17T01:11:19.640

1

You could use a , or a | instead and it will take it as a seperator and technically you could use anything

from the man page

\cregexpc
           Match lines matching the regular expression regexp.  The  c  may
      be any character.

As you can see you should start with a \ before your separator at the beginning ,then you can use it as a separator.

from the documentation http://www.gnu.org/software/sed/manual/sed.html#The-_0022s_0022-Command :

The / characters may be uniformly replaced by any other single character 
within any given s command.

The / character (or whatever other character is used in its stead) can appear in 
the regexp or replacement only if it is preceded by a \ character.

Example:

sed -e 'somevar|s|foo|bar|'
echo "Hello all" | sed "s_all_user_"
echo "Hello all" | sed "s,all,user,"

echo "Hello/ World" | sed "s,Hello/,Neo,"

edited Jan 17 '16 at 01:11

answered Jan 17 '16 at 00:34

user3566929

381

You are talking about allow the use of a single, specific character in the replacement string - in this case, "/". I'm talking about preventing it from trying to interpret the replacement string altogether. No matter what character you use ("/", ",", "|", etc) you always risk having that character pop up in the replacement string. Also, the initial character is not the only special character that sed cares about, is it? – Tal Jan 17 '16 at 01:05
@Tal no it can take anything instead of / and it will ignore the / happily as i just pointed out .. in fact , you can even look for it and replace it in a string >>> i have edited with an example >>>these stuff are not that safe and you always will find a smarter dude – user3566929 Jan 17 '16 at 01:10
@Tal why do you want to prevent it from interpreting? i mean that is the use of sed in the first place, what is your project? – user3566929 Jan 17 '16 at 01:20
All I need is to replace a keyword with a string. sed seems to be the most common way, by far, to do this in linux. The string can be 100 pages long. I don't want to try to sanitize the string so that sed doesn't freak out when reading it - I want it to be able to handle any characters in the string, and by "handle", I mean not try to find magical meaning within. – Tal Jan 17 '16 at 04:41
@Tal bash has tr which is faster ,very useful and less strain full when it comes to computing because it is a built-in ,but of course less powerful and you could also check this http://stackoverflow.com/questions/918886/how-do-i-split-a-string-on-a-delimiter-in-bash for IFS manipulation<<<<>>>>> but generally what you want to do is very hard and error-prone so you have much trial-and-error to do – user3566929 Jan 17 '16 at 04:52
AFAIK, tr can replace a single character with a different character - not a keyword with any length string. I also don't see how IFS would help. Is it really that hard to replace a string with another string in linux? The entire bash scripting language is based on string manipulation, and this is one of the most basic operations I can think of. – Tal Jan 17 '16 at 08:24
1

@Tal, bash is NOT for string manipulation. At all, at all, at all. It is for file manipulation and command coordination. It happens to have some built in handy functionality for strings, but really limited and not very fast at all if that's the main thing you're doing. See "Why is using a shell loop to process text considered bad practice?" Some tools that are designed for text processing are, in order from most basic to most powerful: sed, awk and Perl. – Wildcard Jan 17 '16 at 09:19
@Wildcard he said he doesn't want to revise and escape all the string cause it is a 100 pages long and correct me if i am wrong @Tal .. of course bash is not(it can manipulate a bit with %,:), but i thought he was gonna open a shell anyhow which most probably will be bash so i thought he should trim with bash and then use sed or awk just to do the complex stuff and i think thats easier and faster – user3566929 Jan 17 '16 at 09:32

PM 2Ring · Answer 5 · 2016-01-17T10:08:05.537

1

You can backslash-escape the forward slashes in your replacement string, using Bash's pattern substitution parameter expansion. It's a little messy because the forward slashes also need to be escaped for Bash.

$ var='a/b/c';var="${var//\//\\/}";echo 'this is a test' | sed "s/i/$var/g"

output

tha/b/cs a/b/cs a test

You could put the parameter expansion directly into your sed command:

$ var='a/b/c';echo 'this is a test' | sed "s/i/${var//\//\\/}/g"

but I think the first form is a little more readable. And of course if you're going to re-use the same replacement pattern in multiple sed commands it makes sense to just do the conversion once.

Another option would be to use a script written in awk, perl or Python, or a C program, to do your substitutions instead of using sed.

Here's a simple example in Python that works if the keyword to be replaced is a complete line in the input file (not counting the newline). As you can see, it's essentially the same algorithm as your Bash example, but it reads the input file more efficiently.

import sys

#Get the keyword and replacement texts from the command line
keyword, replacement = sys.argv[1:]
for line in sys.stdin:
    #Strip any trailing whitespace
    line = line.rstrip()
    if line == keyword:
        line = replacement
    print(line)

edited Jan 17 '16 at 10:08

answered Jan 17 '16 at 07:02

PM 2Ring

6,633

This is just another way to sanitize the input, and not a great one at that, as it only handles one specific character ('/'). As Wildcard pointed out, there is more to beware of than just the delimiter string. – Tal Jan 17 '16 at 08:31
Fair call. Eg, if the replacement text contains any backslash-escaped sequences they will be interpreted, which may not be desirable. One way around that would be to convert the problematic chars (or the whole thing) to \x-style escape sequences. Or to use a program that can handle arbitrary input, as I mentioned in my last paragraph. – PM 2Ring Jan 17 '16 at 08:48
@Tal: I'll add a simple Python example to my answer. – PM 2Ring Jan 17 '16 at 10:03
The python script works great, and seems to do exactly what my function does, only far more efficiently. Unfortunately, if the main script is bash (as is in my case), this requires the use of a secondary external python script. – Tal Jan 17 '16 at 18:42

score 1 · Answer 6 · answered Jan 17 '16 at 09:16

If it's line-based and only one line to replace, I recommend prepending the file itself with the replacement line using printf, storing that first line in sed's hold space, and dropping it in as needed. This way you don't have to worry about special characters at all. (The only assumption here is that $VAR contains a single line of text without any newlines, which is what you said in the comments already.) Other than newlines, VAR could contain anything whatsoever and this would work regardless.

VAR=whatever
{ printf '%s\n' "$VAR";cat somefile; } | sed '1{h;d;};/KEYWORD/g'

printf '%s\n' will print the contents of $VAR as a literal string, regardless of its contents, followed by a newline. (echo will do other things in some cases, for example if the contents of $VAR begins with a hyphen—it will be interpreted as an option flag being passed to echo.)

The braces are used to prepend the output of printf to the contents of somefile as it's passed to sed. Whitespace separating the curly braces by themselves is important here, as is the semicolon before the closing curly brace.

1{h;d;}; as a sed command will store the first line of text in sed's hold space, then delete the line (rather than printing it).

/KEYWORD/ applies the following actions to all lines that contain KEYWORD. The action is get, which gets the contents of the hold space and drops it in place of the pattern space—in other words, the entire current line. (This isn't for replacing only part of a line.) The hold space isn't emptied out, by the way, just copied into the pattern space, replacing whatever is there.

If you want to anchor your regex so it won't match a line which merely contains KEYWORD but only a line where there is nothing else on the line but KEYWORD, add a beginning of line anchor (^) and end of line anchor ($) to your regex:

VAR=whatever
{ printf '%s\n' "$VAR";cat somefile; } | sed '1{h;d;};/^KEYWORD$/g'

Seems great if your VAR is one line long. I actually mentioned in the comments that VAR "can be 100 pages long" rather than one line. Sorry for the confusion. — Tal, Jan 19 '16 at 05:09

score 0 · Answer 7 · answered Sep 14 '23 at 13:06

If you need to do this in a script, you can go with an escape function, i.e.:

#!/bin/bash
escvar () {
  sed -e 's/[/&]/\&/g' <<< $1
}
replacement='https://google.com/?query=some\delimited|(query)&count=1'
sed -e "s/<placeholder>/$(escvar $replacement)/" <<< 'value=<placeholder>'

It should be enough to escape "dangerous" characters for sed.

Tal · Answer 8 · 2016-01-19T05:02:08.333

-1

This is the way I went:

#Replaces a keyword with a long string
#
#This is normally done with sed, but sed
#tries to interpret the string you are
#replacing the keyword with too hard
#
#stdin - contents to look through
#Arg 1 - keyword to replace
#Arg 2 - what to replace keyword with
replace() {
        KEYWORD="$1"
        REPLACEMENT_STRING="$2"

        while IFS= read -r LINE
        do
                if [[ "$LINE" == "$KEYWORD" ]]
                then
                        printf "%s\n" "$REPLACEMENT_STRING"
                else
                        printf "%s\n" "$LINE"
                fi
        done < /dev/stdin
}

this works great in my case because my keyword is on a line all by itself. If the keyword was in a line with other text, this would not work.

I would still really like to know if there's an easy way to do this that doesn't involve coding my own solution.

edited Jan 19 '16 at 05:02

answered Jan 17 '16 at 08:38

Tal

2,112

1

If you're really worried about special characters and robustness, you shouldn't be using echo at all. Use printf instead. And doing text processing in a shell loop is a bad idea. – Wildcard Jan 17 '16 at 09:05
1

It would have been helpful if you mentioned in the question that the keyword will always be a complete line. FWIW, bash's read is rather slow. It's meant for processing interactive user input, not text file processing. It's slow because it reads stdin char by char, making a system call for each char. – PM 2Ring Jan 17 '16 at 09:43
@PM 2Ring My question didn't mention that the keyword is on a line of its own because I don't want an answer that just works in such a limited number of cases - I wanted something that could easily work no matter where the keyword was. I also never said my code is efficient - if it was, I wouldn't be looking for an alternative... – Tal Jan 17 '16 at 16:30
@Wildcard Unless I'm missing something, printf absolutely interprets special characters, and far more so than the default 'echo' does. printf "hi\n" will make printf print a newline while echo "hi\n" prints it as is. – Tal Jan 17 '16 at 17:01
@Tal, the "f" in printf stands for "format"—the first argument to printf is a format specifier. If that specifier is %s\n, meaning "string followed by newline", nothing in the next argument will be interpreted or translated by printf at all. (The shell can still interpret it, of course; best stick it all in single quotes if it's a literal string, or double quotes if you want variable expansion.) See my answer using printf for more details. – Wildcard Jan 18 '16 at 07:01

Is there a way to prevent sed from interpreting the replacement string?

8 Answers8

Linked