Given the vi
tag on this question, and the fact that I've found that automated file editing with POSIX-compliant ex
commands gets short shrift on this site compared to the plethora of advice on sed
, awk
, grep
and even Perl, here is a POSIX-compliant ex
command that will perform the desired filtering:
ex -sc 'g/.*\(on line\)/s//\1/ | .w!>>output
q!' input
Note the embedded newline in the command—this is necessary for full POSIX portability as there is no other definite way to end the g
lobal command; however most implementations allow multiple -c
commands, in which case the following one liner would work just the same:
ex -sc 'g/.*\(on line\)/s//\1/ | .w!>>output' -c 'q!' input
There is bit of regex magic and a lot of ex
-command magic contained in this command, and since ex
doesn't seem to be very widely known, I'll explain each part:
-s
starts ex
in silent mode, "in preparation for batch processing", so nothing gets output to your terminal.
-c
means "Run the following command when the file is opened." (input
is the name of the file to open.)
The ex
command itself is really two commands:
g/.*\(on line\)/s//\1/ | .w!>>output
q!
g
is the "global" command and means, "Run the following commands (the rest of the line) on all lines of the file matching the specified regex."
The regex given is .*\(on line\)
, which means 'Any characters any number of times, including 0, followed by "on line"'. The parentheses are used to capture "on line" for backreferencing later.
In actual fact the g
command itself could just as well be g/on line/
and it would work the same. However, the s
ubstitute command I wrote uses nothing for its regex—s//
—which means "reuse the last used regex". Then the s
command uses \1
for the replacement text, meaning "on line" in this case.
The pipe symbol |
in an ex
command doesn't mean a pipe as it does in the shell. Instead it is usually used to delimit separate ex
commands, each to be run sequentially but independently. However the g
lobal command is an exception to this: in a global command, the vertical bar separates commands which are all within the global command—that is, such commands are only run on the lines matching the regex specified in the global command.
The command following the vertical bar is in this case a w
rite command. It's preceded by a dot .
specifying "current line"; without this address specifier the write command will write the entire file, regardless of what is the current line. (Since we're using the write command within a global command, if we were to omit the dot, the write command would write the entire file after each matching line had the substitution command performed on it!)
The >>
means, "If the file already exists, append to it rather than giving an error." Since we're writing to the file multiple times, this is necessary, otherwise we would only end up with the last line that was written to the output file. The !
preceding the >>
means "If the file doesn't already exist, create the file and write to it rather than throwing an error." (Without the !
it's unspecified in POSIX whether this would happen or not.) And of course output
is the name of the file to write to.
Finally, of course, q!
means "quit without saving changes to the current file." We've made substitutions on many lines of the input
file, but we don't want to save those changes, so we use q!
.
There are some other approaches which are equivalent, for example the following:
ex -sc '%s/.*\(on line\)/\1/e | v//d
w output | q!' input
But this uses the e
flag to the substitute command, which is not in POSIX. (If this flag is omitted, the batch processing will stop on the occasion where the regex .*\(on line\)
isn't found anywhere in the file.)
Of course, where ex
really shines is in in-place file editing. But it can certainly be used to filter a file out to another file, as illustrated above.
on line
? How about lines that containon line
multiple times? Please do not respond in comments; [edit] your question to make it clearer and more complete. – G-Man Says 'Reinstate Monica' Jan 25 '16 at 08:13