1

If I have a text file with this content:

 someline
 <!--\
      file first read on 2015/01/11

And I want to delete <!--\ and everything until what comes after "on", how do I do it? The expected output would be this with the example above:

someline
2015/01/11

I can't make a pattern that extracts dates, because 2015/01/11 can be just Sunday or Yesterday or almost anything else. read can also be anything. I tried this with BSD sed:

sed 's/<!--\
     file first .* on//g'

But when I run this command, I get this error:

sed: 1: "s/<!--\
        file f ...": unterminated substitute pattern

So I tried backslash escaping < and ! but I got the same "unterminated substitute pattern" error. I tried to install GNU sed and do the same thing except \n, I also tried gsed 's/<!--:a;N;$!ba;s/\n/file first .* on//g' but I got:

gsed: -e expression #1, char 22: unknown option to `s'

Can sed not do this? If not, how do I do it with any other tool/language?

DisplayName
  • 11,688

3 Answers3

2

The following sed command should do what you want:

sed '/^<!--/{N; s/.*on *//}' inputfile

First we search for the regex <!-- at the beginning of the line, than we use the N command to append the next line to it and delete (substitute with nothing, actually) everything till and with "on".

There are people claiming that whenever you use a capital letter command in sed, such as N, you are using the wrong tool...

pfnuesel
  • 5,837
2

POSIXly:

$ sed -e '/<!--/{
  $!N
  s/.*on //
}' <in >out
cuonglm
  • 153,898
1

Perl can read the whole file with -0777, the /s modifier makes . match newlines, too:

perl -0777 -pe 's/<!--\\.*?on //gs'

*? is a "frugal asterisk", which means "repeat zero or more times, but match the shortest string possible".

choroba
  • 47,233