You can use awk's gsub()
function to replace all occurrences of "
and "
(quote followed by a space AND space followed by a quote) to some arbitrary separator, and set FS to that separator and extract what you want. Note that if you change FS then the numbering of fields will also change. You will also need to reset FS back to its original value to correctly process the next input line.
In your case, you also want to extract some data (date and time) from fields before the FS is changed.
e.g. if ./file
contains 5 lines, each an exact copy of the sample line you provided:
$ grep -i 'logged in' ./file | tail | awk '
{ d=$1;
t=$2; sub(/\..*/,"",t);
FS="XXX";
gsub(/" | "/,"XXX",$0);
print $2,"logged in at", t, d;
FS="[[:space:]]+"
}'
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21
I used XXX as the field separator because it doesn't appear anywhere in the input. A tab character would have worked just as well for this example, but that wouldn't have demonstrated that field-separators don't have to be a single character - which will be important if you can't (or can't easily) determine a single character which isn't used anywhere in the input.
It gets more complicated if you need to extract field data from after the double-quoted fields (e.g the IP address or udp port fields) - you can't extract them before the gsub
because you can't be sure what their field number is going to be. I'd be inclined to use perl
at this point (or maybe even sed
as in @Wildcard's answer), but one way to do it with awk
is to expand the gsub
function call's regular expression to suit. e.g. replacing the awk
script with this:
$ grep -i 'logged in' ./file | tail | awk '
{ d=$1;
t=$2;
sub(/\..*/,"",t);
FS="XXX";
gsub(/" | "|address: |, /,"XXX",$0);
sub(/ .*/,"",$8); # get rid of trailing junk after udp port
print $2,"logged in at", t, d, "as" ,$4, "from", $6":"$8;
FS="[[:space:]]+"
}'
would produce output like this:
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
For completeness, here's one way to do it in perl
using the perl core module Text::ParseWords
:
#!/usr/bin/perl
use strict;
use Text::ParseWords;
my $keep=1; # keep " chars in output. set to 0 to strip them.
while(<>) {
my @F = quotewords('\s+', $keep, $_);
$F[1] =~ s/\..*//; # strip decimal fraction from time field
$F[10] =~ s/,//; # strip trailing comma from IP address field
# remember: perl array indices start at zero, not one.
printf "%s logged in at %s %s as %s from %s:%s\n", @F[5,1,0,7,10,13];
}
This uses the quotewords()
function from Text::Parsewords
to split each input line into fields (stored in an array called @F
), does some minor cleanup on some of the fields, and then prints the required fields with printf
.
As a one-liner, it would be written as:
grep -i 'logged in' ./file | tail | perl -MText::ParseWords -n -e '
@F = quotewords(q/\s+/, 1, $_);
$F[1] =~ s/\..*//;
$F[10] =~ s/,//;
printf "%s logged in at %s %s as %s from %s:%s\n", @F[5,1,0,7,10,13]'
Note how I changed '/s+'
to q/\s+/
- perl has some great quoting operators which can be used to avoid the single-quote inside single-quote problem.
"filed1" "field2" "fil edN"
. If so you can replace the space between quotes with a different field separtor or, better, have the file already compiled with a different FS, eg;
. Then use the awk-F ";"
to select the desidered column. – baselab Dec 21 '17 at 11:43