I did something... complicated. I've been looking into ex
/ed
recently - I'm not very good with either - and this marked an opportunity to dive a little deeper. This parses the ed
script first and passes it off to ed
in-stream:
b='[:blank:]'
sed -e 'h;/\n/!i\' -e 0i -e 's/^\(.*[^\]\)*\(\\\\\)*\\$//;tn'"
/^\n*\([0-9;$,.$b]*[gGvV].*\\\\\n[$b]*\)*\([0-9,$.;${b}]*[aic][$b]*\)\
\(\n\(.*\)\n\.\)*\(\n.*\)*$/{ s//\4/;:n" -e 'G;//{N;D
};g;s//\1\2/;l;x;s//\4/;l;H;s/.*/./;a\' -e '.
};l;g;i\' -e .\\ -e 1,.p\\ -e u <ed_script | ed
It is less complicated then before - and now virtually all of the complication lies in a single regex spanning two lines. That one long regex handles virtually all of the testing for the entire script.
The idea is that, as near as I can tell, you can only get to insert mode with one of either the a
ppend, i
nsert, or c
hange commands. insert mode then takes all input literally up to the next occurring line consisting of only a .
dot. Any other continued command that spans multiple lines - even a sequence of such where G
, g
, V
, or v
are involved - is necessarily continued to the next line with a trailing \
backslash - though, as usual, a \
backslash escapes itself in that context.
So, while it's entirely possible I'm mistaken, I think this handles all cases. For every input line that doesn't match a [aic]
... .
dot series sed
inserts a series of commands that looks like:
0i
command-line$
.
1,.p
u
...instructing ed
to i
nsert an unequivocal l
ook (as written by sed
) at its own command, then to p
rint it, and last to u
ndo the whole operation - which has the very convenient result of getting the edit done, printing it, reversing it, and restoring the last address in a single action.
For those that lines that do match in a sequence of either trailing backslashes or an [aic]
... .
series it is a little more complicated. In those cases sed
recursively pulls them in until it encounters the end of the series before doing its l
ook. I was careful to separate the [aic]
, .
, and actual literal input into separate prints - each of those types will get its own l
ook - such that the literal input is strung together as much as possible (sed
will break a l
ook output at 80 chars by default).
I guess it's easier just to show you. You'll notice the ?
prompt below - this occurs because the g
command given before it is not valid command - not because sed
mangles the input (I hope). Here is the output from a modified version of your example dataset:
g \\\n a$
hello\nworld\\\n\n 0a\n world\\\nworld\nworld$
.$
?
,n$
1 hello
2 world\
3
4 0a
5 world\
6 world
7 world
,s,o,O,g$
4$
0a
.,$n$
4 0a
5 wOrld\
6 wOrld
7 wOrld
,s,$,\\\n\\\n\\\\$
\
,n$
1 hellO
2
3 \
4 wOrld\
5
6 \
7
8
9 \
10 0a
11
12 \
13 wOrld\
14
15 \
16 wOrld
17
18 \
19 wOrld
20
21 \
Q$
ed
reads fromSTDIN
by default and provides the ouput that you want. Just write your input text directly intoed
. – Alexej Magura Dec 29 '14 at 21:08