How do I replace this pattern with a newline inside it?

Question

If I have a text file with this content:

 someline
 <!--\
      file first read on 2015/01/11

And I want to delete <!--\ and everything until what comes after "on", how do I do it? The expected output would be this with the example above:

someline
2015/01/11

I can't make a pattern that extracts dates, because 2015/01/11 can be just Sunday or Yesterday or almost anything else. read can also be anything. I tried this with BSD sed:

sed 's/<!--\
     file first .* on//g'

But when I run this command, I get this error:

sed: 1: "s/<!--\
        file f ...": unterminated substitute pattern

So I tried backslash escaping < and ! but I got the same "unterminated substitute pattern" error. I tried to install GNU sed and do the same thing except \n, I also tried gsed 's/<!--:a;N;$!ba;s/\n/file first .* on//g' but I got:

gsed: -e expression #1, char 22: unknown option to `s'

Can sed not do this? If not, how do I do it with any other tool/language?

So your expected output is someline \n file first read on? — pfnuesel, Jan 24 '16 at 17:53
@pfnuesel No, it's someline \n 2015/01/11 (with the example above) — DisplayName, Jan 24 '16 at 17:54

pfnuesel · Answer 1 · 2016-01-24T18:01:57.717

The following sed command should do what you want:

sed '/^<!--/{N; s/.*on *//}' inputfile

First we search for the regex <!-- at the beginning of the line, than we use the N command to append the next line to it and delete (substitute with nothing, actually) everything till and with "on".

There are people claiming that whenever you use a capital letter command in sed, such as N, you are using the wrong tool...

score 2 · Answer 2 · answered Jan 24 '16 at 18:01

2

POSIXly:

$ sed -e '/<!--/{
  $!N
  s/.*on //
}' <in >out

answered Jan 24 '16 at 18:01

cuonglm

153,898

score 1 · Answer 3 · answered Jan 24 '16 at 18:06

1

Perl can read the whole file with -0777, the /s modifier makes . match newlines, too:

perl -0777 -pe 's/<!--\\.*?on //gs'

*? is a "frugal asterisk", which means "repeat zero or more times, but match the shortest string possible".

answered Jan 24 '16 at 18:06

choroba

47,233

How do I replace this pattern with a newline inside it?

3 Answers3