Replace string within multiple files using per line strings from one file

Question

In all the many years using this site I’ve never had to ask a question because there has ALWAYS been an answer (usually numerous). I’m pretty sure this one Does too but for the life of me I cannot find it.

I have directory with a bunch of files which have numerous lines of random length.

a.txt
b.txt
c.txt
d.txt

Then I have single fileeg.txtwith a set list of strings

opq  111
rst  222
uvw  333
xyz  444

Each of the txt files has a single string I’d like to replace

a.txt has a#P#b
b.txt has c#P#d
c.txt has e#P#f
d.txt has g#P#h

I want to replace #P# with the second ‘column’ from my file of strings. The #P# occurs only one time per file (because I’ve put it there). The result would be

a.txt has a111b
b.txt has c222d
c.txt has e333f
d.txt has g444h

The ‘constant‘ assumption is that there are as many lines ineg.txtas there are.txtfiles in my directory and they are in Alphabetical order. The lines ineg.txtare sorted alphabetically as per ‘column’ 1

I’ve been trying to do it using awk and sed (well actually sd) within a for loop but I’m failing to get it to read both ‘source’ and ‘target’ line by line.

I’m not fussy as to how I achieve the result. Currently I’m not working with many lines or files (15 lines and 15 files right now) but there will be times where there will be quite a lot more. I am using zsh as my shell on both an Arch & Debian based linux distro (WSL 2 at times)

Apologies if this has an answer. I’ve really tried to find it over the last two days while working on this project and my brain is now spent.

EDIT: Updated to clarify that the files in the directory have numerous lines of various length and that my given string #P# occurs only once per file

Ed Morton · Answer 1 · 2020-08-04T13:34:19.473

3

Using GNU awk for "inplace" editing and ARGIND:

awk -i inplace '
    NR == FNR { map[NR]=$2 }
    NR != FNR { sub(/#P#/,map[ARGIND]) }
1' eg.txt ?.txt

The above assumes the replacement text from eg.txt doesn't contain spaces or &s.

edited Aug 04 '20 at 13:34

answered Aug 04 '20 at 13:23

Ed Morton

31,617

1

While I used the initial answer to solve my problem this one has been super handy because of its sheer versatility. I had no idea how versatile awk was. Really helped give me a better understanding of awk and the amazing uses for NR, FNR, ORS, OFS etc along with sub, map & ARGIND. Thank you! – 0m3rta3 Aug 10 '20 at 11:02
You're welcome. The main benefit to this approach is it'll run orders of magnitude faster than calling sed in a loop reading one line at a time - see why-is-using-a-shell-loop-to-process-text-considered-bad-practice. – Ed Morton Aug 10 '20 at 12:37

score 2 · Accepted Answer · edited Aug 04 '20 at 05:26

2

preparations

Only one line in each file.

$ grep -- . ?.txt
a.txt:a#P#b
b.txt:c#P#d
c.txt:e#P#f
d.txt:g#P#h

$ cat input
opq  111
rst  222
uvw  333
xyz  444

solution

Have a shell loop call sed for each file:

for file in ?.txt; do
    read -r dummy new_string rest
    sed -- "s/#P#/$new_string/g" "$file"
done <input
a111b
c222d
e333f
g444h

Change that to sed -i with GNU sed or compatible or sed -i '' with FreeBSD sed or compatible if you are satisfied with the result for having the files changed.

The above assumes the lines of input don't contain &, /, nor \ characters. If they may you would have to escape those with backslashes first.

edited Aug 04 '20 at 05:26

Stéphane Chazelas

544,893

answered Aug 03 '20 at 21:53

Hauke Laging

90,279

Sorry I don’t think i was clear that only the single file has one ‘result’ per line. The txt files within the directory have numerous lines but the string #P#only occurs one time in each file. Your example has however solved a separate issue I had with a similar issue on separate files. So thank you – 0m3rta3 Aug 03 '20 at 22:04
@0m3rta3 That's what I assumed. Should not be a problem for my solution. – Hauke Laging Aug 03 '20 at 22:10
But the grep preparation wouldn’t give that kind of output. Unless I’m missing something? Which is possible. I’m using this in a couple of ways. One example is where the directory of files are puppeteer scripts. The 2nd column of the single file are various URLs (all different). I then place an uncommon string (#P#) everywhere I want a URL inserted and then loop the list to insert each url into each script. So each script has different calls and different URLs. This is just an example one of the things I’m trying to do. – 0m3rta3 Aug 03 '20 at 22:16
URL’s might not be the best example because I don’t have to worry about escaping special characters in the majority of my cases. It’s just a simple replace(this)-with(that) – 0m3rta3 Aug 03 '20 at 22:28
@0m3rta3 The grep is just supposed to show to readers what my files look like so that my code can easily be tested. The sed should change just the one respective line in a multi-line file. Have you tried that at all? Of course, the sed call needs a separator char which does not appear in your URLs or you will get into quoting hell. – Hauke Laging Aug 03 '20 at 22:35
I’m gonna be honest with you. The whole read dummy line...!Through me for my own kind of loop. I’m still kinda new to scripting and stuff beyond just above the basics. So I had to go and read up what the heck that was about. Anyway. Not only did it work but I think I’ve learnt more from yours and the other answer too with regards to text processing, bash, loops and sed than I have in the last 3 days looking up solutions on how to do this. Thank you! – 0m3rta3 Aug 03 '20 at 23:09

Quasímodo · Answer 3 · 2020-08-04T22:19:42.457

2

#!/bin/sh
mv eg.txt eg.input
awk 'NR==FNR{a[++i]=$2;next}{sub("#P#",a[++j]);print>(FILENAME".new")}' eg.input ./*.txt &&
for f in *.txt; do mv "$f.new" "$f"; done
mv eg.input eg.txt

eg.txt is renamed to eg.input and then back so that *.txt in the awk line expands only to the files that should be modified.

NR==FNR{    #For the first file, eg.input
  a[++i]=$2   #Put the second field in the array `a`
  next        #Skip the rest of the code
}
{                        #For the other files
  sub("#P#",a[++j])        #Make the substitution
  print>(FILENAME".new")   #Print to the line to `FILENAME`.new
}

Then, in a for loop, the old *.txt files contents are overwritten by the *.new files contents. You may want to suppress the for loop until you are convinced that the *.new files are correct.

Some awk implementations do not handle many open files (GNU awk does). If your awk exits with "too many open files" error, use this variant,

awk 'NR==FNR{a[++i]=$2;next}FNR==1{close(fn);fn=FILENAME".new"}{sub("#P#",a[++j]);print>fn}'

edited Aug 04 '20 at 22:19

answered Aug 03 '20 at 22:34

Quasímodo

18,865
4
36
73

For some reason it seems to work so far up to inserting the new string. The result is just the #P# is removed. Ran it with -x set and I think I see why that is and will check it out shortly but I just wanted to say thank you as both these answers have taught me more in under and hour than I’ve learned trying to figure this out the last 3 days. Both your answers Introduced me to commands I wasn’t familiar with and looking at the Mans with them as context is SUPER handy. I upvoted but it wont show till I have 15 rep. Thank you again! – 0m3rta3 Aug 03 '20 at 23:15
@0m3rta3 I don't think -x will help you much as that is a Bash flag. From your description it seems for some reason your the array did not get populated, although I tested here in sample files and it worked. Well, if you learned from the answers, I'm already happy. Always much glad helping those who are willing to learn. – Quasímodo Aug 03 '20 at 23:26
Oh. Right. Thanks for pointing that’s out because I didn’t copy paste your script (was told from the start not to get into that habit and Ive tried to stick to it). But that comes with its own issues when I don’t pay attention. As such I used the bash shebang not regular sh. Hence the -x working. That might be where the issue actually is though. Lol. Dummy – 0m3rta3 Aug 03 '20 at 23:30
@0m3rta3 Always copy/paste code to/from this site, don't try to re-type it because then you end up wasting your time and other peoples time trying to help you with problems that simply don't exist in the code. There's nothing in the posted script that requires sh or bash nor requires you not to use either of them. – Ed Morton Aug 04 '20 at 13:40

score 1 · Answer 4 · answered Aug 04 '20 at 03:30

Since you already are on zsh and I presume you are with GNU's version of sed, then we can do it like as shown in a two step process.

setopt extended_glob
sed -Ei -e '/#P#/R eg.txt' ./(^eg).txt
sed -Ei -e '/#P#/N;s/#P#(.)\n.\s(.*)/\2\1/' ./(^eg).txt

Brief explanation

Turn on extended globbing so that we can filter out a specific file eg.txt from the sed commandline.
Place the respective line from eg.txt after the #P# containing line with the help of the R command. Read up on this GNU specific command in the manual for more info.
Here we merge the two lines and do a cut n paste job to get the desired output.

The files were edited inplace (except eg.txt)

score -1 · Answer 5 · answered Aug 08 '20 at 09:16

-1

eg.txt

opq  111
rst  222
uvw  333
xyz  444

a.txt

a#P#b
12345
apple

b.txt

c#P#d
56788

command

j=1;for i in "a.txt" "b.txt" ; do  b=`sed -n ''$j'p' eg.txt| awk '{print $2}'`;sed "s/#P#/$b/g" $i;echo "=================";j=$(($j+1)); done
output
below are the output of a.txt
a111b
12345
apple
=================
below are the output of b.txt
c222d
56788
=================

answered Aug 08 '20 at 09:16

Praveen Kumar BS

5,211

Please let me known the reason for downvote – Praveen Kumar BS Aug 08 '20 at 10:26
Not sure who down voted but I’d imagine it’s because this doesn’t do what I was needing. At least the outputs are not at all what I was looking for as per my question. – 0m3rta3 Aug 10 '20 at 10:55

Replace string within multiple files using per line strings from one file

5 Answers5

preparations

solution