4

This should be straight-forward, but I cannot figure it out. If I want to replace an A or B with a C using sed the code could potentially be:

$ echo AAXXAAYYBB | sed 's/[AB]/C/g'
CCXXCCYYCC

This results in all A's and B's converted to C's.

What I'd like to do is replace "A" with either one of two (or potentially more) variables:

Input:

AAXXAAYYBB

Code:

sed 's/A/[BC]/g'

Output (where the substitution of B or C is random):

BCXXCBYYBB 

But this code will only change A's to...

$ echo AAXXAAYYBB | sed 's/A/[BC]/g'
[BC][BC]XX[BC][BC]YYBB

I'm trying to avoid PERL here if possible. Does anyone have an idea how to fix this?

  • The replacement text in the s command in sed is text, not an expression. – Kusalananda Jun 20 '19 at 21:41
  • 4
    It would be fairly easy to make the replacements alternate between B and C (using conditional branching) – steeldriver Jun 20 '19 at 22:02
  • Hello. Thank you for reviewing my question. I am going to try to see if I can make this work without using perl based on your suggestions. I am trying write a .sh that I hope anyone would be able to modify. If I have to add perl, it might be a little complicated for potential users. – Anthony D Jun 20 '19 at 23:48
  • 2
    Since the replacement string has to be randomly chosen from B and C, you can use just s/A/C/g; you're just randomly choosing C every time. Seriously, if you want to simulate some mutation or such, you should try a bit harder to define your problem and requirements. –  Jun 21 '19 at 09:54
  • Any particular reason you are trying to avoid using perl (not PERL) for this? – spazm Jul 01 '21 at 00:02

5 Answers5

7

It is possible to replace the first match of an string with:

${str/A/...}

And, it is possible to generate a random (not a cryptographic secure number) value with:

r=(B C)
${r[RANDOM%2]}      

Each time the variable r is expanded.

An exact equivalent operation but a lot faster to implement is an AND to extract the last bit of the value: ${r[RANDOM&1]}

So:

#!/bin/bash

str=AAXXAAYYBB
r=(B C)

while [ "${str%"${str#*A}"}" ]; do      # while there is an A to change
    str=${str/A/"${r[RANDOM&1]}"}
done

echo "str=$str"

Will generate a random result each time it is called.

Posixly

#!/bin/sh

str=AAXXAAYYZZAAA

while [ "${str%"${str#*A}"}" ]; do             # while there is an A.
    r=$(od -An -tu1 -N 1 /dev/urandom)         # get one random byte
    r=$((r&1))                                 # Is it even or odd?
    if [ "$r" -eq 0 ]; then s=B; else s=C;fi   # Select B or C 
    str="${str%%A*}${s}${str#*A}"              # Change the string.
done

echo "str=$str"

Perhaps the random byte could be read with a simpler (but more cryptic) that most of time use the faster builtin printf:

r=$(printf '%d\n' "'$(head -c1 /dev/urandom)")
5

Not Sed, but avoids Perl:

$ echo AAXXAAYYBB | gawk '
    BEGIN{srand()} 
    {
      n = patsplit($0,a,/A/,s); 
      for(i=1;i<=n;i++) printf("%s%s", rand() < 0.5 ? "B" : "C", s[i]); 
      print ""
    }
  '
CBXXCCYYBB
steeldriver
  • 81,074
2

Plenty of ways to skin this particular cat once we resort to scripting, but here is something I threw together - It may not be pretty, (and relies on a bash shell!) but it might help you out:

#!/bin/bash

TEXT="AAXXAAYYBB"

echo "Start: $TEXT"

# So long as there are un-converted 'A' in the input string...
while [[ "$TEXT" =~ A ]]
do
        # .. convert one 'A' to a random choice of either 'B' or 'C'
        TEXT=$(echo $TEXT | sed -e "s/A/$(((RANDOM%2>0))&&echo B || echo C)/")

        # lets show how we are progressing...
        echo "Progress: $TEXT"
done

# No more 'A' in the input string, we are done:
echo "End: $TEXT"

Example output:

First run:

Start: AAXXAAYYBB
Progress: BAXXAAYYBB
Progress: BBXXAAYYBB
Progress: BBXXBAYYBB
Progress: BBXXBCYYBB
End: BBXXBCYYBB

Second run:

Start: AAXXAAYYBB
Progress: CAXXAAYYBB
Progress: CBXXAAYYBB
Progress: CBXXCAYYBB
Progress: CBXXCBYYBB
End: CBXXCBYYBB
muru
  • 72,889
bunnymjh
  • 121
  • 1
    Nice idea (+1)! You should quote your variables though (echo "$TEXT") to preserve whitespace. Also, even better is printf to deal with escape characters. Even better is a herestring (<<< "$TEXT") to not open a new process. – Sparhawk Jun 20 '19 at 22:48
0

It is much more "straightforward" to replace A or B with C than to replace A with B or C, as the or in the former case has nothing to do with randomness, while in the latter it's based on it. So while the former could be broken down into two simple steps:

  1. replace A with C
  2. replace B with C

In the latter case there's the whole lot of deciding which one should be the replacement in every particular case, B or C? What's the basis? How random should the randomness be?

sed doesn't offer any random operations as far as I know. (While Perl should be a good tool.)

See related:

0

perl loves regex

If you choose to use perl, you'll find the e flag useful for substitutions. This evaluates the replacement as code.

e.g.: s/A/c("BC")/eg where c is subroutine to pull a random char from a string.

Hard coded with A->[BC] :

sub c {
  if(my $s = shift) {
    my $index = int(rand(length($s)));
    return substr($s, $index, 1);
  }
}

while(<<>>){ print s/A/C("BC")/eg }

Or compressed into a not terribly pretty one-liner. (actually two-liner for clarity):

perl -E 'sub c {if(my $l = shift) {substr($l, int(rand(length($l))), 1);}}' -plE 's/A/c(BCBBB)/eg'

Expanded into a random_replace:

#!/usr/bin/env perl
use strict;
use warnings;

die "usage: random_replace regex string_of_replacement_chars\n" unless @ARGV == 2; my $search = shift; my $replace = shift;

sub c { if (my $s = shift) { my $index = int(rand(length($s))); substr($s, $index, 1); } }

while(<<>>) { s/$search/c($replace)/eg; print; }

% echo "AAAAAA" | ./random_replace A BC
CCCCBC

% echo "AAAAAA" | ./random_replace A BC
BBBBBC

As an extra bonus, the search can be a regex. Let's say you want to replace A or B with C or D:

% echo "AAABBB" | ./random_replace '[A-B]' CD
CCCDCD

% echo "AAABBB" | ./random_replace '[A-B]' CD DDCCDD

spazm
  • 111