How about this sed
one-liner:
sed -n '${;p;q;};N;/^ *\([^ ][^ ]* *[^ ][^ ]*\)\( .*\)*\n *\1/{;s/\n.*//;h;G;D;};P;D' inputfile
This was a nice tricky challenge; thanks! :)
At a high level, what this does is iterate through the inputfile comparing two lines at a time. If the lines match up to the first two words, the second line of the two is discarded and the next line from the file is taken to compare to the first line. If the lines don't match, the first is printed and the second retained for comparison with later lines. When the end of the file is reached, the line currently "held for comparison" is printed.
Blow by blow explanation:
-n doN't print lines by default; only if specified to print them.
${;p;q;}; if on the la$t line then Print the line and Quit.
N; append a newline followed by the Next line of the file to the pattern space
/^ *\([^ ][^ ]* *[^ ][^ ]*\)\( .*\)*\n *\1/ A very tricky regex:
match any leading spaces, followed by a nonspace sequence, space or
multiple spaces, nonspace sequence, then optionally a space followed
by anything, then a newline, then any leading spaces, then the matched
two words from earlier again.
{; if that regex matched the pattern space, excecute the following.
s/\n.*//; delete the first newline and everything after it
h; copy the pattern space contents to the Hold space
G; append (Get) a newline followed by the hold space contents to the pattern space
D; delete everything in the pattern space up to the first newline, then start from the beginning of this sequence (with the ${ block)
}; end of block. Skip to here if the tricky regex didn't match.
P; Print everything in the pattern space up to the first newline.
D Delete the pattern space up to the first newline.
Note that the above is very portable. Deliberately so. Just for a challenge I wanted it to run without ?
or +
being available (as they are not POSIX compatible), which makes the regex much more finicky.
In addition, the logic flow doesn't include any branches, although branches are POSIX compatible and universally available. Why did I do this? It's because not all implementations of sed
allow for labels to be specified in a one-liner. They require a \
and a newline after the label. GNU sed allows labels in an actual one-liner and, for example, BSD sed doesn't.
The following two one liners are each an exact equivalent using GNU sed, the only difference being they are more robust by handling tabs as well as spaces:
sed -n ':k;${;p;q;};N;/^\s*\(\S\+\s\+\S\+\)\(\s.*\)\?\n\s*\1/{;s/\n.*//;bk;};P;D' inputfile
sed -n ':k;${;p;q;};N;s/^\(\s*\(\S\+\s\+\S\+\)\(\s.*\)\?\)\n\s*\2.*$/\1/;tk;P;D' inputfile
I mostly did this for fun. :) I think 1_CR's answer is the best, and of course it's simplest by far.
If your requirements get a little more tricky than they currently are and his approach won't work, the best tool is probably awk
. But I haven't learned awk
yet and I have learned sed
. :)
if [[ -z $(grep "$lookup" output.txt) ]]
bit helped me with my problem! – Cartwig Oct 25 '15 at 04:25if [[ -z $(grep "$lookup" output.txt) ]]
. It is very, very, inefficient. It's a linear search which takes longer each time a record is added. It works great on small data sets, but upon scaling up you will have a noticeable performance hit. If you need to process the records one by one, you should read the output of of @1_CR's solutions. I'm updating my original answer with an example. – RobertL Oct 25 '15 at 15:56exit 0
. Why do you want to hide error status? – RobertL Oct 25 '15 at 17:21[[
...]]
because using[
...]
accomplishes the same results in this case, is also a shell built-in (/bin/sh, dash).[[
is a non-universal shell extension which lowers portabililty and simplicity, with no advantage in this case. – RobertL Oct 25 '15 at 18:39