1

I'm trying to replace a character in a file at a random position. My file looks something like:

aab  
babab  
abab  

I'm trying to replace a random character for 'c'. So the output might look like:

aab  
bcbab  
abab 

I have tried removing all line breaks and saving in a file new_string.txt and then using sed but it isn't working.

This is the code I have tried:

rand1="$(shuf -i 0-$tot_len -n 1)"
sed "s/^\(.\{"${rand1}"\}\)./\1G/" new_string.txt

I keep getting the error:

sed: -e expression #1, char 25: Invalid content of \{\}

4 Answers4

1

No need for the curly brackets in your variable, and the variable should be quoted as well. Use:

sed "s/^\(.\{$rand1\}\)./\1G/" new_string.txt

UPDATE: as stated below in comments:

The original code is fine, however the integer for $rand1 is too large for sed. I found that the maximum value can be 32767 for GNU sed, i.e. sed still takes 16bit integers only.

You can obtain that limit for the system's regular expression library (though GNU sed generally uses a builtin version) with:

$ getconf RE_DUP_MAX
32767

POSIX requires that limit to be at least _POSIX_RE_DUP_MAX (255), and that's the maximum you can expect portably (some systems like Solaris or OS/X have it as low as that).

FelixJN
  • 13,566
  • I've tried that too. I still get the same error. – user3891532 Aug 11 '15 at 12:03
  • what is echo $rand1 giving you? because it worked in my test. – FelixJN Aug 11 '15 at 12:04
  • echo $rand1 works fine – user3891532 Aug 11 '15 at 12:05
  • what is the output of echo $rand1 | cat -A ? – FelixJN Aug 11 '15 at 12:09
  • 5255810$ . 5255810 is the value of $rand1 and my string is larger than that – user3891532 Aug 11 '15 at 12:11
  • seems to be a limitation of sed: rand1=10000 works fine at my machine, rand1=100000 doesn't (with same as your error). Maybe get a workaround by a) selecting a random line, b) selecting a random position in that line – FelixJN Aug 11 '15 at 12:18
  • btw: it breaks down after: rand1=32767, or in other words: the highest 16bit integer – FelixJN Aug 11 '15 at 12:25
  • Sadly I need them to be picked with uniform probability and since all my lines are different lengths I can't do that. – user3891532 Aug 11 '15 at 12:27
  • Is the total number of characters random? You could fold your single line into equal sized lines and then do two random picks (line and character). Would be uniform if all lines are of same length after folding. – FelixJN Aug 11 '15 at 12:31
  • Thanks for your help. I will use fold but then since the last line may not be the same length I will instead use $rand1 modulo linelength to get position in line and the floor of $rand1 to get line number. Then I'll use sed as above. – user3891532 Aug 11 '15 at 12:47
  • Note that it depends on the sed implementation. On Solaris 10 x86_64, for both /usr/xpg4/bin/sed and /bin/sed, and on Apple OS/X the limit is 255 (the minimum required by POSIX (_POSIX_RE_DUP_MAX)). – Stéphane Chazelas Aug 11 '15 at 13:20
1

On a GNU system, to substitute one character (other than newline) at random, you could do:

file=myfile.txt
offset=$(grep -bo . < "$file" | cut -d: -f1 | shuf -n1)
[ -z "$offset" ] || # file doesn't have non-newline characters
  printf c | dd bs=1 seek="$offset" of="$file" conv=notrunc status=none

(with old versions of GNU dd (prior to 8.20), replace status=none with 2> /dev/null).

grep -bo . < "$file" would give you the offset in number of bytes in the file of each non-newline character. For instance, with a file encoded in UTF-8 that contains:

$3
£1
€2

That gives us:

$ grep -bo . < "$file"
0:$
1:3
3:£
5:1
7:€
10:2

With cut -d: -f1, we retain the part before the first colon. Then, we pick one of those offsets at random with shuf -n1.

That assumes the replacement character has the same size as the replaced one. For instance, replacing that £ above (2 bytes) with c (1 byte) would leave the file with c followed by an invalid character.

To work around that, we can't overwrite the file in-place anymore as we'd need to shift data around.

We'd need something like:

perl -C -0777 -pi -e "substr \$_, $offset, 1, 'c'" -- "$file"

instead. With -C, perl honours the locale for what constitutes a character. -0777 -p turns on the slurp mode where the content of $file is slurped into $_ (see Security implications of running perl -ne '…' * though for security considerations with that construct). -pi gives you in-place editing, $_ is written back to the file after the code is run. Then we call substr to substitute the 1 character at the given offset with c.

0

Try this:

sed 's/^\(.\{'"${rand1}"'\}\)./\1G/'  new_string.txt
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
0

With new GNU sed you can do it even without \newline remove

sed -z "s/./@/$(($RANDOM%$(wc -m < file.txt)))" file.txt
Costas
  • 14,916