Replace strings in a file based on a list of strings and a list of corresponding replacements

Question

I am trying to replace strings in a file A:

Hello Peter, how is your dad? where is mom?

where the strings to be replaced are in file B:

Peter
dad
mom

and their corresponding replacements are in file C:

John
wife
grandpa

Expected outcome:

Hello John, how is your wife? where is grandpa?

Can I edit file A, replacing the value in file B by using the value from the corresponding line in file C?

What I have done so far:

 cat 1.txt | sed -e "s/$(sed 's:/:\\/:g' 2.txt)/$(sed 's:/:\\/:g' 3.txt)/" > 4.txt

it works if there is only one line in file B & file C, if there is more than one line, it won't work.

yes, i can replace peter to john if there is only one line in file2 and file 3, but when there is more than one line, it just wont work. any idea? — Robert Choy, Mar 20 '16 at 17:08
Of course it doesn't work, sed doesn't do multi-line replacements like that... In fact you want to use some sort of dict here which makes your question similar to this one although not as complicated. — don_crissti, Mar 20 '16 at 17:29
Why would you need to use a pair of files that match by implicit line number? Maintaining that with paired insertions is the most error-prone method I can think of. You don't even know if they are the same length. For this job, my first pass would just join them and ensure they were the same length and had no blank lines. — Paul_Pedant, Dec 12 '19 at 17:54

don_crissti · Answer 1 · 2016-03-21T12:02:06.517

The easiest way to do this with sed is to process those two lists and turn them into a script-file e.g.

s/line1-from-fileB/line1-from-fileC/g
s/line2-from-fileB/line2-from-fileC/g
....................................
s/lineN-from-fileB/lineN-from-fileC/g

that sed will then execute, editing fileA. The proper way is to process the LHS/RHS first and escape any special characters that may appear on those lines, then join the LHS and RHS adding the s, the delimiters / and the g (e.g. with paste) and pipe the result to sed:

paste -ds///g /dev/null /dev/null \
<(sed 's|[[\.*^$/]|\\&|g' fileB) <(sed 's|[\&/]|\\&|g' fileC) \
/dev/null /dev/null | sed -f - fileA

So there it is: one paste and three seds that will process each file only once, regardless of the number of lines.
This assumes that your shell supports process substitution and that your sed can read a script-file from stdin. Also, it doesn't edit in-place (I've left out the -i switch as it's not supported by all seds)

score 2 · Answer 2 · answered Aug 31 '17 at 15:14

If you want the replacements to be done independently of each other, for instance for:

foo -> bar
bar -> foo

Applied on

foobar

To result in:

barfoo

as opposed to foofoo as a naive s/foo/bar/g; s/bar/foo/g translation would do, you could do:

perl -pe '
  BEGIN{
    open STRINGS, "<", shift@ARGV or die"STRINGS: $!";
    open REPLACEMENTS, "<", shift@ARGV or die "REPLACEMENTS: $!";
    while (defined($a=<STRINGS>) and defined($b=<REPLACEMENTS>)) {
      chomp ($a, $b);
      push @repl, $b;
      push @re, "$a(?{\$repl=\$repl[" . $i++. "]})"
    }
    eval q($re = qr{) . join("|", @re) . "}";
  }
  s/$re/$repl/g' strings.txt replacements.txt fileA

That's perl regexps expected in patterns.txt. Since perl regexps can execute arbitrary code, it's important that they be sanitized. If you want to replace fixed strings only, you can change that to:

perl -pe '
  BEGIN{
    open PATTERNS, "<", shift@ARGV or die"PATTERNS: $!";
    open REPLACEMENTS, "<", shift@ARGV or die "REPLACEMENTS: $!";
    for ($i = 0; defined($a=<PATTERNS>) and defined($b=<REPLACEMENTS>); $i++) {
      chomp ($a, $b);
      push @string, $a;
      push @repl, $b;
      push @re, "\\Q\$string[$i]\\E(?{\$repl=\$repl[$i]})"
    }
    eval q($re = qr{) . join("|", @re) . "}";
  }
  s/$re/$repl/g' patterns.txt replacements.txt fileA

score 1 · Answer 3 · answered Mar 20 '16 at 17:54

In the simple example you show where each of the target words appears only once in the file, you could simply do:

$ paste fileB fileC | while read a b; do sed -i "s/$a/$b/" fileA; done
$ cat fileA
Hello John, how is your wife? where is grandpa?

The paste command will print the data from both files combined:

$ paste fileB fileC
Peter   John
dad wife
mom grandpa

We pass this through a simple while read loop which will iterate over every line, saving the value from fileB as $a and that of fileC as $b. Then, the sed command will replace the first occurrence of $a with $b. This is repeated three times.

This approach is fine if you know that your target words only appear once in the file (they have to, otherwise, you'll need to provide more details that we can use to identify which occurrence should be replaced) and if your files are tiny, like what you showed. For larger files, this will take a long time and is very inefficient since it will need to be run once for every pair of words.

So, if you have larger files, you might want something like this instead:

paste fileB fileC | 
    perl -lane '$words{$F[0]}=$F[1]} 
        END{open(A,"fileA"); while(<A>){s/$_/$words{$_}/ for keys %words; print}'

score 1 · Answer 4 · answered Sep 17 '21 at 11:09

1

Using xargs, paste, and sed commands:

xargs -a <(paste -d'/' fileB fileC) -L1 -I @ sed -i "s/@/g" fileA

This will process fileA N times where N is the number of lines in fileB or fileC.

answered Sep 17 '21 at 11:09

s.ouchene

321

infinite-etcetera · Accepted Answer · 2016-03-20T19:16:06.517

0

solution i've created is not very short, but is simple enough to be very readable. unless your task was to do the whole thing with sed... ?

 #!/usr/bin/bash

 cp A.txt D.txt

 x=1
 length=$(wc -l B.txt | sed 's/\ .*//g')

 until [ $x -eq $length ]; do

    Bx=$(awk "NR==$x" B.txt)
    Cx=$(awk "NR==$x" C.txt)

    sed -i "s/$Bx/$Cx/g" D.txt

    x=$(($x+1))

 done

 rm -f ./sed*

note that this script creates a tonne of junk files if B.txt longer than C.txt and perhaps visa versa (didn't test it that far)

edited Mar 20 '16 at 19:16

answered Mar 20 '16 at 17:29

infinite-etcetera

324

1

So, if those two files had 1000 lines each you would run cat | head 2000 times and sed 1000 times ? – don_crissti Mar 20 '16 at 17:31
honestly i hadn't considered scale. but they are basic commands, with little overhead, so i don't see why not. edit: yes i can delete the cat. maybe i'll merge the sed commands but regex is fiddly – infinite-etcetera Mar 20 '16 at 17:33
And why are you using s///g? The OP didn't say anything about replacing all occurrences of the word. – terdon Mar 20 '16 at 17:55
all suggestions welcome, though the UNIX man below thought that wasn't useful. i thought about merging the regex but then wouldn't that have to be more an explicit than implicit match ?? not sure if the original sentence is to change. probably there's also a more elegant way to grab the line number without reading everything in the file until the right line – infinite-etcetera Mar 20 '16 at 18:01
@don_crissti well, the OP's sed doesn't make any kind of sense at all (I really have no idea what they were trying to do there) so I'm assuming the OP is not an expert and, by extension, I'm not sure we can assume anything :). – terdon Mar 20 '16 at 18:04
@infinite-etcetera you can get the target line with awk 'NR==N' file. So, for example B1=$(awk 'NR==1' B.txt). – terdon Mar 20 '16 at 18:06
god bless my txt only has about 30 - 50 lines, and i just copy and paste 50 times, it helped me out! – Robert Choy Mar 20 '16 at 18:30
you should write a loop ! you don't need to do so much copy & paste ... – infinite-etcetera Mar 20 '16 at 18:31
looped it for you, by the way, if you're still copying & pasting – infinite-etcetera Mar 20 '16 at 18:44

score -2 · Answer 6 · edited Apr 13 '17 at 12:36

This might help your problem solved. (Refer: https://unix.stackexchange.com/questions/283017/awk-command-i-want-to-compare-two-rows-in-two-files-and-update-the-second-file-i)

Source.txt has following two lines:

OldString
NewString

Before command execution Target.txt has following lines:

OldString ==> NewString
This is Target File containing OldString now.
OldString is to be replaced.
NewString won't get impacted.

Use:

awk -v lookupStr=`awk 'NR==1' Source.txt` -v replacementStr=`awk 'NR==2' Source.txt` 'NR==2 && (idx=index($0,lookupStr)) { $0=substr($0,1,idx-1) replacementStr substr($0,idx+length(lookupStr)) } 1' Target.txt > temp.txt && mv temp.txt Target.txt

Post command execution Target.txt has following line:

OldString ==> NewString
This is Target File containing NewString now.
OldString is to be replaced.
NewString won't get impacted.

Here I have defined two variables lookupStr and replacementStr. both are assigned to line#1 and line#2 of Source.txt respectively. Then in the Sencond line of Target.txt I am replacing content of $0 with first character till index of lookupStr (i.e."OldString") then appending the replacementStr (i.e. "NewString") and then concatenating rest of the characters. At the end output is being written to a temp.txt and same is renamed to Target.txt

If you need to do this replacement exercise in entire file, just remove condition NR==2 && from above command.

Replace strings in a file based on a list of strings and a list of corresponding replacements

6 Answers6

Linked

Related