sed '/\n/P;//D;y|]|\n|
s|\n>/string>|]|
y|[]\n|\n[]|
s|string>!\nTEST\n\(.*\[\)|[\1|
y|\n[|[\n|;D' <<\IN
string>![TEST[][]Extract[ ]this[ ]string[][]>/string>
IN
It maybe that you can specify that the square brackets are acceptable delimiters here, but, if so, it seems strange that the end delimiters would be so elaborate in that case. And anyway, as the question only states that you need to get text from between string>![TEST[
and ]>/string>
and so that's what this tries to do - though it does fail if text should span newline boundaries.I
Anyway, it works by:
y|]|\n|
- It first translates all occurrences of ]
on a line for a \n
ewline.
s|\n>/string>|]|
- It next replaces the first occurring \n
ewline which is followed immediately by your right end delimiter with ]
(which makes it the only possible ]
on a line at that time).
y|[]\n|\n[]|
- If the last substitution was successful that one ]
is translated to a [
while all \n
ewlines are translated back to ]
and all [
are simultaneously translated to \n
ewlines - the three character types are shifted, basically.
s|string>!\nTEST\n\(.*\[\)|[\1|
- If the left end delimiter is found preceding a [
at that time then it must be that both ends of the first occurrence of text have been found. That match is substituted for [
.
y|\n[|[\n|
- And so in the last translation if there are any [
on a line at all they will become newlines and all newlines will become [
.
At this point everything up to the first occurring newline (or the entire line if there are none at all) is D
eleted. If anything remains it is sent to the top of the script. If the previous iteration resulted in two \n
ewlines in pattern space - both ends of your delimited text then it is P
rinted up to the first occurring \n
ewline. Else the pattern space already tested is cleared and the cycle continues.
And so the above example prints:
][]Extract[ ]this[ ]string[][
...and it will print each on a separate line as many similarly delimited strings as can be fully left and right delimited per line or nothing at all.