2

I'm new here and haven't posted much, I'll try to make this as clear as I can.

I want to perform a find and replace that involves three files: find.csv identifies lines containing strings. I want to replace the entire line with a line from replace.csv. The third file is the mainfile.csv, which contains about 1000 lines.

This is what I have so far, but I'm getting an error message:

sed "s/$(cat find.csv)/$(cat replace.csv)/" mainfile.csv > out.csv
sed: 1: "s/CHL_13_R4 
DCK_09_R4  ...": unterminated substitute pattern

Here is what the contents of the files look like:

find.csv
CHL_13_R4 
DCK_09_R4 
DCK_10_R4 
DCK_11_R4 
DCK_13_R4 

replace.csv
CHL_13_R12,CHL_13,R12,10/14/2014
DCK_09_R12,DCK_09,R12,10/14/2014
DCK_10_R32,DCK_10,R32,10/14/2014
DCK_11_R21,DCK_11,R21,10/14/2014

The mainfile contains entries as in replace.csv, but about 30 need to be updated. So, taking the first line as an example. The line in mainfile.csv that has 'CHL_13_R4' needs to be replaced with CHL_13_R12,CHL_13,R12,10/14/2014

Thanks for the help!

  • 2
    You have more lines in find.csv than you do in replace.csv. – Wildcard May 03 '16 at 23:24
  • "mainfile contains entries as in replace.csv" if so, it contains nothing that matches any of the strings in find.csv surely? Did you mean it contains entries as in find.csv? – steeldriver May 03 '16 at 23:25
  • I provided examples of the contents of find.csv and replicates.csv. I accidentaly grabed one more line of find.csv. Both files contain 28 lines of text. Sorry about that. – GigaZaur May 04 '16 at 01:08

1 Answers1

0

First create a sed script from those two files:

paste -d$'\t' find.csv replace.csv | 
    sed -e 's:/:\\/:g; s:\t:/:; s:^:s/:; s:$:/g;:' > myscript.sed

That will replace all occurrences of strings in find.csv with the strings in replace.csv. It will fail if any of the lines in find.csv contain a tab character, as that is being used by paste as the separator between the joined lines.

Output looks like this:

s/CHL_13_R4/CHL_13_R12,CHL_13,R12,10\/14\/2014/g;
s/DCK_09_R4/DCK_09_R12,DCK_09,R12,10\/14\/2014/g;
s/DCK_10_R4/DCK_10_R32,DCK_10,R32,10\/14\/2014/g;
s/DCK_11_R4/DCK_11_R21,DCK_11,R21,10\/14\/2014/g;
s/DCK_13_R4//g;

(note that the last line doesn't have a replacement. that's because your find.csv had 5 lines while your replace.csv only had 4 lines)

If you want to replace the entire line containing strings from find.csv:

paste -d$'\t' find.csv replace.csv | 
    awk -F$'\t' '{gsub(/\//,"\\/"); print "/"$1"/ s/^.*/"$2"/;"}' > myscript.sed

Output of this version looks like this:

/CHL_13_R4/ s/^.*/CHL_13_R12,CHL_13,R12,10\/14\/2014/;
/DCK_09_R4/ s/^.*/DCK_09_R12,DCK_09,R12,10\/14\/2014/;
/DCK_10_R4/ s/^.*/DCK_10_R32,DCK_10,R32,10\/14\/2014/;
/DCK_11_R4/ s/^.*/DCK_11_R21,DCK_11,R21,10\/14\/2014/;
/DCK_13_R4/ s/^.*//;

Anyway, whichever version works best for you, once you've generated the myscript.sed script, run it on your mainfile.csv:

sed -f myscript.sed mainfile.csv

(optionally use -i if you want to do an 'in-place' edit on mainfile.csv)

NOTE: it's possible to do this without using a temporary file like myscript.sed to hold the script. Most versions of sed can run scripts from stdin. But this way allows you to examine and/or edit the generated sed script before running it on your main file.

cas
  • 78,579
  • Won't this turn your delimiter into \/? I think there is no easy way to handle this without process substitution to perform the byte stuffing. (Unless perhaps you can use a non-printing character like ^A as the delimiter to paste and also for the s commands you are creating.) – Wildcard May 04 '16 at 00:47
  • Hi cas, that's an interesting solution. I hadn't thought of creating a file to contain the substitution commands line by line. I followed your post and took the second option, as I want to replace the entire line identified by find.csv with the corresponding line in replace.csv. Unfortunately, the final sed command (sed -f myscript.sed mainfile.csv) didn't work. I'm using a mac, not sure if this is a unix/mac issue. My computer yells at me when I use the -i command, as in sed -f myscript.sed -i mainfile.csv > out.csv – GigaZaur May 04 '16 at 01:30
  • If your version of sed doesn't support the -i option, that's no big deal. In fact, you shouldn't use -i if you're redirecting output to out.csv. -i does an in-place edit (i.e. write the output to a temp file, and then mv the temp file over the original file). – cas May 04 '16 at 01:45
  • Hey, thanks cas. Your solution worked. I couldn't get sed -f myscript.csv mainfile.csv to work becuase there was a space between the name from find.csv and the / in the myscript.sed I created, that wasn't contained in the sample you posted. So, /CHL_13_R4 / s/^.*/CHL_13_R12,CHL_13,R12,10\/14\/2014/; needed to be fixed to /CHL_13_R4/ s/^.*/CHL_13_R12,CHL_13,R12,10\/14\/2014/; Not sure how that space in front of the "R4" got in there, as I copy and pasted what you wrote. Thanks a lot for taking the time to help me. Cheers – GigaZaur May 04 '16 at 01:58
  • It was probably in find.csv. When i created that file on my system I had to strip trailing spaces from each line. I assumed it was just a copy-and-paste artifact at the time, but it looks like it was in your original file. – cas May 04 '16 at 02:04
  • Yep, I just checked it in vim. The space is in the original file. Good lesson to learn: check and clean your files of trailing spaces/tabs/invisibles. – GigaZaur May 04 '16 at 02:08