Replacing a character at a random position using sed?

Question

I'm trying to replace a character in a file at a random position. My file looks something like:

aab  
babab  
abab

I'm trying to replace a random character for 'c'. So the output might look like:

aab  
bcbab  
abab

I have tried removing all line breaks and saving in a file new_string.txt and then using sed but it isn't working.

This is the code I have tried:

rand1="$(shuf -i 0-$tot_len -n 1)"
sed "s/^\(.\{"${rand1}"\}\)./\1G/" new_string.txt

I keep getting the error:

sed: -e expression #1, char 25: Invalid content of \{\}

A random character? Or a random character other than newline? — Stéphane Chazelas, Aug 11 '15 at 11:59
@StéphaneChazelas he meant a character at a random position in the string. — FelixJN, Aug 11 '15 at 12:03
A random character or a character at random position or a random character at random position? — jimmij, Aug 11 '15 at 12:06
Sorry for not being clear. A specified character at a random position — user3891532, Aug 11 '15 at 12:07
Your problem may very well be in your $rand1 or $tot_len variable. Run that with set -x and quote your variables. The sed error mentions char 25 but that expression has fewer than 25 characters. — Stéphane Chazelas, Aug 11 '15 at 12:29
@StéphaneChazelas if you read @Fiximan answer you'll see that my value of $rand1 is too large for sed. — user3891532, Aug 11 '15 at 12:33

score 1 · Accepted Answer · edited Aug 11 '15 at 13:36

1

No need for the curly brackets in your variable, and the variable should be quoted as well. Use:

sed "s/^\(.\{$rand1\}\)./\1G/" new_string.txt

UPDATE: as stated below in comments:

The original code is fine, however the integer for $rand1 is too large for sed. I found that the maximum value can be 32767 for GNU sed, i.e. sed still takes 16bit integers only.

You can obtain that limit for the system's regular expression library (though GNU sed generally uses a builtin version) with:

$ getconf RE_DUP_MAX
32767

POSIX requires that limit to be at least _POSIX_RE_DUP_MAX (255), and that's the maximum you can expect portably (some systems like Solaris or OS/X have it as low as that).

edited Aug 11 '15 at 13:36

Stéphane Chazelas

544,893

answered Aug 11 '15 at 12:00

FelixJN

13,566

I've tried that too. I still get the same error. – user3891532 Aug 11 '15 at 12:03
what is echo $rand1 giving you? because it worked in my test. – FelixJN Aug 11 '15 at 12:04
echo $rand1 works fine – user3891532 Aug 11 '15 at 12:05
what is the output of echo $rand1 | cat -A ? – FelixJN Aug 11 '15 at 12:09
5255810$ . 5255810 is the value of $rand1 and my string is larger than that – user3891532 Aug 11 '15 at 12:11
seems to be a limitation of sed: rand1=10000 works fine at my machine, rand1=100000 doesn't (with same as your error). Maybe get a workaround by a) selecting a random line, b) selecting a random position in that line – FelixJN Aug 11 '15 at 12:18
btw: it breaks down after: rand1=32767, or in other words: the highest 16bit integer – FelixJN Aug 11 '15 at 12:25
Sadly I need them to be picked with uniform probability and since all my lines are different lengths I can't do that. – user3891532 Aug 11 '15 at 12:27
Is the total number of characters random? You could fold your single line into equal sized lines and then do two random picks (line and character). Would be uniform if all lines are of same length after folding. – FelixJN Aug 11 '15 at 12:31
Thanks for your help. I will use fold but then since the last line may not be the same length I will instead use $rand1 modulo linelength to get position in line and the floor of $rand1 to get line number. Then I'll use sed as above. – user3891532 Aug 11 '15 at 12:47
Note that it depends on the sed implementation. On Solaris 10 x86_64, for both /usr/xpg4/bin/sed and /bin/sed, and on Apple OS/X the limit is 255 (the minimum required by POSIX (_POSIX_RE_DUP_MAX)). – Stéphane Chazelas Aug 11 '15 at 13:20

Stéphane Chazelas · Answer 2 · 2015-08-11T12:54:12.003

On a GNU system, to substitute one character (other than newline) at random, you could do:

file=myfile.txt
offset=$(grep -bo . < "$file" | cut -d: -f1 | shuf -n1)
[ -z "$offset" ] || # file doesn't have non-newline characters
  printf c | dd bs=1 seek="$offset" of="$file" conv=notrunc status=none

(with old versions of GNU dd (prior to 8.20), replace status=none with 2> /dev/null).

grep -bo . < "$file" would give you the offset in number of bytes in the file of each non-newline character. For instance, with a file encoded in UTF-8 that contains:

$3
£1
€2

That gives us:

$ grep -bo . < "$file"
0:$
1:3
3:£
5:1
7:€
10:2

With cut -d: -f1, we retain the part before the first colon. Then, we pick one of those offsets at random with shuf -n1.

That assumes the replacement character has the same size as the replaced one. For instance, replacing that £ above (2 bytes) with c (1 byte) would leave the file with c followed by an invalid character.

To work around that, we can't overwrite the file in-place anymore as we'd need to shift data around.

We'd need something like:

perl -C -0777 -pi -e "substr \$_, $offset, 1, 'c'" -- "$file"

instead. With -C, perl honours the locale for what constitutes a character. -0777 -p turns on the slurp mode where the content of $file is slurped into $_ (see Security implications of running perl -ne '…' * though for security considerations with that construct). -pi gives you in-place editing, $_ is written back to the file after the code is run. Then we call substr to substitute the 1 character at the given offset with c.

I've just blindly copied your code as I don't quite understand what you've done but I have the error message: dd: invalid status flag: ‘none’ — user3891532, Aug 11 '15 at 12:25
@user3891532, see edit, you must have an old version of GNU dd. — Stéphane Chazelas, Aug 11 '15 at 12:36

score 0 · Answer 3 · edited Jan 26 '19 at 15:41

0

Try this:

sed 's/^\(.\{'"${rand1}"'\}\)./\1G/'  new_string.txt

edited Jan 26 '19 at 15:41

Rui F Ribeiro

56,709
26
150
232

answered Aug 11 '15 at 12:05

Shravan Yadav

224

That doesn't work either – user3891532 Aug 11 '15 at 12:06
are you still geting same error?? – Shravan Yadav Aug 11 '15 at 12:07
It's still the same error. – user3891532 Aug 11 '15 at 12:08
What is value in tot_len?? Is it getting some value or it os blank?? – Shravan Yadav Aug 11 '15 at 12:33
may be rand1 value is getting some value larger than the string length. – Shravan Yadav Aug 11 '15 at 12:48
If you read Fiximan's answer you'll see that my value of $rand1 was too large for sed to handle. – user3891532 Aug 11 '15 at 12:49

score 0 · Answer 4 · edited Aug 11 '15 at 13:25

0

With new GNU sed you can do it even without \newline remove

sed -z "s/./@/$(($RANDOM%$(wc -m < file.txt)))" file.txt

edited Aug 11 '15 at 13:25

Stéphane Chazelas

544,893

answered Aug 11 '15 at 13:08

Costas

14,916

Note that $RANDOM is ksh/zsh/bash specific and is limited to 0-32767. – Stéphane Chazelas Aug 11 '15 at 13:27
That assumes file.txt doesn't contain NUL characters (which should be a safe assumption for a text file, just as that it contains only valid characters in the current locale which is also a requirement (and of my answer as well)). – Stéphane Chazelas Aug 11 '15 at 13:28
Note that it may replace newline characters. – Stéphane Chazelas Aug 11 '15 at 13:29

Replacing a character at a random position using sed?

4 Answers4