I worked out the sed
answer not long after I posted this question; no one else has used sed
so far so here it is:
sed '$!N;/^\(.*\)\n\1$/d;P;D'
A little playing around with the more general problem (what about deleting lines in sets of three? Or four, or five?) provided the following extensible solution:
sed -e ':top' -e '$!{/\n/!{N;b top' -e '};};/^\(.*\)\n\1$/d;P;D' temp
Extended to remove triples of lines:
sed -e ':top' -e '$!{/\n.*\n/!{N;b top' -e '};};/^\(.*\)\n\1\n\1$/d;P;D' temp
Or to remove quads of lines:
sed -e ':top' -e '$!{/\n.*\n.*\n/!{N;b top' -e '};};/^\(.*\)\n\1\n\1\n\1$/d;P;D' temp
sed
has an additional advantage over most other options, which is its ability to truly operate in a stream, with no more memory storage needed than the actual number of lines to be checked for duplicates.
As cuonglm pointed out in the comments, setting the locale to C is necessary to avoid failures to properly remove lines containing multi-byte characters. So the commands above become:
LC_ALL=C sed '$!N;/^\(.*\)\n\1$/d;P;D' temp
LC_ALL=C sed -e ':top' -e '$!{/\n/!{N;b top' -e '};};/^\(.*\)\n\1$/d;P;D' temp
LC_ALL=C sed -e ':top' -e '$!{/\n.*\n/!{N;b top' -e '};};/^\(.*\)\n\1\n\1$/d;P;D' temp
# Etc.
C
, otherwise in multi-byte locale, invalid character in that locale cause the command fail. – cuonglm Apr 19 '16 at 04:54