1

In all the many years using this site I’ve never had to ask a question because there has ALWAYS been an answer (usually numerous). I’m pretty sure this one Does too but for the life of me I cannot find it.

I have directory with a bunch of files which have numerous lines of random length.

a.txt
b.txt
c.txt
d.txt

Then I have single fileeg.txtwith a set list of strings

opq  111
rst  222
uvw  333
xyz  444

Each of the txt files has a single string I’d like to replace

a.txt has a#P#b
b.txt has c#P#d
c.txt has e#P#f
d.txt has g#P#h

I want to replace #P# with the second ‘column’ from my file of strings. The #P# occurs only one time per file (because I’ve put it there). The result would be

a.txt has a111b
b.txt has c222d
c.txt has e333f
d.txt has g444h

The ‘constant‘ assumption is that there are as many lines ineg.txtas there are.txtfiles in my directory and they are in Alphabetical order. The lines ineg.txtare sorted alphabetically as per ‘column’ 1

I’ve been trying to do it using awk and sed (well actually sd) within a for loop but I’m failing to get it to read both ‘source’ and ‘target’ line by line.

I’m not fussy as to how I achieve the result. Currently I’m not working with many lines or files (15 lines and 15 files right now) but there will be times where there will be quite a lot more. I am using zsh as my shell on both an Arch & Debian based linux distro (WSL 2 at times)

Apologies if this has an answer. I’ve really tried to find it over the last two days while working on this project and my brain is now spent.

EDIT: Updated to clarify that the files in the directory have numerous lines of various length and that my given string #P# occurs only once per file

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
0m3rta3
  • 23

5 Answers5

3

Using GNU awk for "inplace" editing and ARGIND:

awk -i inplace '
    NR == FNR { map[NR]=$2 }
    NR != FNR { sub(/#P#/,map[ARGIND]) }
1' eg.txt ?.txt

The above assumes the replacement text from eg.txt doesn't contain spaces or &s.

Ed Morton
  • 31,617
2

preparations

Only one line in each file.

$ grep -- . ?.txt
a.txt:a#P#b
b.txt:c#P#d
c.txt:e#P#f
d.txt:g#P#h
$ cat input
opq  111
rst  222
uvw  333
xyz  444

solution

Have a shell loop call sed for each file:

for file in ?.txt; do
    read -r dummy new_string rest
    sed -- "s/#P#/$new_string/g" "$file"
done <input

a111b c222d e333f g444h

Change that to sed -i with GNU sed or compatible or sed -i '' with FreeBSD sed or compatible if you are satisfied with the result for having the files changed.

The above assumes the lines of input don't contain &, /, nor \ characters. If they may you would have to escape those with backslashes first.

Hauke Laging
  • 90,279
  • Sorry I don’t think i was clear that only the single file has one ‘result’ per line. The txt files within the directory have numerous lines but the string #P#only occurs one time in each file. Your example has however solved a separate issue I had with a similar issue on separate files. So thank you – 0m3rta3 Aug 03 '20 at 22:04
  • @0m3rta3 That's what I assumed. Should not be a problem for my solution. – Hauke Laging Aug 03 '20 at 22:10
  • But the grep preparation wouldn’t give that kind of output. Unless I’m missing something? Which is possible. I’m using this in a couple of ways. One example is where the directory of files are puppeteer scripts. The 2nd column of the single file are various URLs (all different). I then place an uncommon string (#P#) everywhere I want a URL inserted and then loop the list to insert each url into each script. So each script has different calls and different URLs. This is just an example one of the things I’m trying to do. – 0m3rta3 Aug 03 '20 at 22:16
  • URL’s might not be the best example because I don’t have to worry about escaping special characters in the majority of my cases. It’s just a simple replace(this)-with(that) – 0m3rta3 Aug 03 '20 at 22:28
  • @0m3rta3 The grep is just supposed to show to readers what my files look like so that my code can easily be tested. The sed should change just the one respective line in a multi-line file. Have you tried that at all? Of course, the sed call needs a separator char which does not appear in your URLs or you will get into quoting hell. – Hauke Laging Aug 03 '20 at 22:35
  • I’m gonna be honest with you. The whole read dummy line...!Through me for my own kind of loop. I’m still kinda new to scripting and stuff beyond just above the basics. So I had to go and read up what the heck that was about. Anyway. Not only did it work but I think I’ve learnt more from yours and the other answer too with regards to text processing, bash, loops and sed than I have in the last 3 days looking up solutions on how to do this. Thank you! – 0m3rta3 Aug 03 '20 at 23:09
2
#!/bin/sh
mv eg.txt eg.input
awk 'NR==FNR{a[++i]=$2;next}{sub("#P#",a[++j]);print>(FILENAME".new")}' eg.input ./*.txt &&
for f in *.txt; do mv "$f.new" "$f"; done
mv eg.input eg.txt

eg.txt is renamed to eg.input and then back so that *.txt in the awk line expands only to the files that should be modified.

NR==FNR{    #For the first file, eg.input
  a[++i]=$2   #Put the second field in the array `a`
  next        #Skip the rest of the code
}
{                        #For the other files
  sub("#P#",a[++j])        #Make the substitution
  print>(FILENAME".new")   #Print to the line to `FILENAME`.new
}

Then, in a for loop, the old *.txt files contents are overwritten by the *.new files contents. You may want to suppress the for loop until you are convinced that the *.new files are correct.


Some awk implementations do not handle many open files (GNU awk does). If your awk exits with "too many open files" error, use this variant,

awk 'NR==FNR{a[++i]=$2;next}FNR==1{close(fn);fn=FILENAME".new"}{sub("#P#",a[++j]);print>fn}'
Quasímodo
  • 18,865
  • 4
  • 36
  • 73
  • For some reason it seems to work so far up to inserting the new string. The result is just the #P# is removed. Ran it with -x set and I think I see why that is and will check it out shortly but I just wanted to say thank you as both these answers have taught me more in under and hour than I’ve learned trying to figure this out the last 3 days. Both your answers Introduced me to commands I wasn’t familiar with and looking at the Mans with them as context is SUPER handy. I upvoted but it wont show till I have 15 rep. Thank you again! – 0m3rta3 Aug 03 '20 at 23:15
  • @0m3rta3 I don't think -x will help you much as that is a Bash flag. From your description it seems for some reason your the array did not get populated, although I tested here in sample files and it worked. Well, if you learned from the answers, I'm already happy. Always much glad helping those who are willing to learn. – Quasímodo Aug 03 '20 at 23:26
  • Oh. Right. Thanks for pointing that’s out because I didn’t copy paste your script (was told from the start not to get into that habit and Ive tried to stick to it). But that comes with its own issues when I don’t pay attention. As such I used the bash shebang not regular sh. Hence the -x working. That might be where the issue actually is though. Lol. Dummy – 0m3rta3 Aug 03 '20 at 23:30
  • @0m3rta3 Always copy/paste code to/from this site, don't try to re-type it because then you end up wasting your time and other peoples time trying to help you with problems that simply don't exist in the code. There's nothing in the posted script that requires sh or bash nor requires you not to use either of them. – Ed Morton Aug 04 '20 at 13:40
1

Since you already are on zsh and I presume you are with GNU's version of sed, then we can do it like as shown in a two step process.

setopt extended_glob

sed -Ei -e '/#P#/R eg.txt' ./(^eg).txt

sed -Ei -e '/#P#/N;s/#P#(.)\n.\s(.*)/\2\1/' ./(^eg).txt

Brief explanation

  • Turn on extended globbing so that we can filter out a specific file eg.txt from the sed commandline.

  • Place the respective line from eg.txt after the #P# containing line with the help of the R command. Read up on this GNU specific command in the manual for more info.

  • Here we merge the two lines and do a cut n paste job to get the desired output.

The files were edited inplace (except eg.txt)

-1
eg.txt

opq  111
rst  222
uvw  333
xyz  444

a.txt

a#P#b
12345
apple

b.txt

c#P#d
56788

command

j=1;for i in "a.txt" "b.txt" ; do  b=`sed -n ''$j'p' eg.txt| awk '{print $2}'`;sed "s/#P#/$b/g" $i;echo "=================";j=$(($j+1)); done

output

below are the output of a.txt a111b 12345 apple ================= below are the output of b.txt c222d 56788 =================