Given a 425M sized text file with the following content:
--START--
Data=asdfasdf
Device=B
Lorem=Ipsum
--END--
--START--
Data=asdfasdf
Lorem=Ipsum
Device=A
--END--
--START--
Device=B
Data=asdfasdf
--END--
...
The sed
task is to print everything between --START--
and --END--
, where Device=A
is included. There are two solutions provided here and here. There is huge execution time difference between both commands. The second command is quite faster, but needs more description for me how it works?
$ sed -n '/--START--/{:a;N;/--END--/!ba; /Device=A/p}' file
$ sed 'H;/--START--/h;/--END--/!d;x;/Device=A/!d' file
The description of the first command:
How it works:
/--START--/{...}
Every time we reach a line that contains--START--
, run the commands inside the braces{...}
.
:a;
Define a label "a".
N;
Read the next line and add it to the pattern space.
/--END--/!ba
Unless the pattern space now contains--END--
, jump back to labela
.
/Device=A/p
If we get here, that means that the patterns space starts with--START--
and ends with--END--
. If, in addition, the pattern space containsDevice=A
, then print (p
) it.
Description of 2nd command:
sed 'H #add line to hold space /--START--/h #put START into hold space (substitute holded in) /--END--/!d #clean pattern space (start next line) if not END x #put hold space into pattern space /Device=A/!d #clean pattern space if it have not "Device=A" ' file
Device=A
always next line after--START--
? – Cyrus Jun 06 '23 at 20:38sed
? Have you triedawk
orperl
? – Cyrus Jun 06 '23 at 20:39d
delete command stops execution of the rest of the script, and we read a new line and go back to the start of the script. This means there is a tight loop over the first 3 commands until we match the END line. This loop accumulates the data into the hold space. The match on START clears the hold space and puts just the start line in there. – meuh Jun 07 '23 at 06:20