AWK: How to properly display a column which has multiple words and enclosed with quotes?

Question

I use awk like this:

grep -i 'logged in' path-to-file | tail -n -10 | awk '{ print $6, "logged in on ",substr($2,1,8),$1"."; }' | sed 's/"//g'

But in $6 column it's a "nickname" by users so sometimes it's only a one-word column, but sometimes it has more than one word.

2017-12-21 21:54:01.714540 User #41 nickname: "sarah the princes" username: "guest" IP address: 111111111, UDP address: udp logged in.

Instead of printing the whole sarah the princes nickname, it only displays the first word wich is sarah.

Are the fields quoted? Like "filed1" "field2" "fil edN". If so you can replace the space between quotes with a different field separtor or, better, have the file already compiled with a different FS, eg ;. Then use the awk -F ";" to select the desidered column. — baselab, Dec 21 '17 at 11:43

score 0 · Answer 1 · answered Dec 21 '17 at 21:30

0

Try this on for size:

sed -En '
  /^(....-..-..) (..:..:..)[^:]*nickname: "?([^":]+)"? username:.*logged in.*$/ {
    s//\3 logged in at \2 on \1./p
  }
' path-to-file | tail -n 10

answered Dec 21 '17 at 21:30

Wildcard

36,499

cas · Accepted Answer · 2017-12-23T03:27:53.513

You can use awk's gsub() function to replace all occurrences of " and " (quote followed by a space AND space followed by a quote) to some arbitrary separator, and set FS to that separator and extract what you want. Note that if you change FS then the numbering of fields will also change. You will also need to reset FS back to its original value to correctly process the next input line.

In your case, you also want to extract some data (date and time) from fields before the FS is changed.

e.g. if ./file contains 5 lines, each an exact copy of the sample line you provided:

$ grep -i 'logged in' ./file | tail | awk '
{ d=$1;
  t=$2; sub(/\..*/,"",t);

  FS="XXX";
  gsub(/" | "/,"XXX",$0);
  print $2,"logged in at", t, d;
  FS="[[:space:]]+"
}'
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21
sarah the princes logged in at 21:54:01 2017-12-21

I used XXX as the field separator because it doesn't appear anywhere in the input. A tab character would have worked just as well for this example, but that wouldn't have demonstrated that field-separators don't have to be a single character - which will be important if you can't (or can't easily) determine a single character which isn't used anywhere in the input.

It gets more complicated if you need to extract field data from after the double-quoted fields (e.g the IP address or udp port fields) - you can't extract them before the gsub because you can't be sure what their field number is going to be. I'd be inclined to use perl at this point (or maybe even sed as in @Wildcard's answer), but one way to do it with awk is to expand the gsub function call's regular expression to suit. e.g. replacing the awk script with this:

$ grep -i 'logged in' ./file | tail | awk '
{   d=$1;
    t=$2;
    sub(/\..*/,"",t);

    FS="XXX";
    gsub(/" | "|address: |, /,"XXX",$0);
    sub(/ .*/,"",$8);      # get rid of trailing junk after udp port

    print $2,"logged in at", t, d, "as" ,$4, "from", $6":"$8;

    FS="[[:space:]]+"
}'

would produce output like this:

sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp
sarah the princes logged in at 21:54:01 2017-12-21 as guest from 111111111:udp

For completeness, here's one way to do it in perl using the perl core module Text::ParseWords:

#!/usr/bin/perl

use strict;
use Text::ParseWords;

my $keep=1;  # keep " chars in output.  set to 0 to strip them.

while(<>) {
  my @F = quotewords('\s+', $keep, $_);

  $F[1] =~ s/\..*//;  # strip decimal fraction from time field
  $F[10] =~ s/,//;    # strip trailing comma from IP address field

  # remember: perl array indices start at zero, not one.
  printf "%s logged in at %s %s as %s from %s:%s\n", @F[5,1,0,7,10,13];
}

This uses the quotewords() function from Text::Parsewords to split each input line into fields (stored in an array called @F), does some minor cleanup on some of the fields, and then prints the required fields with printf.

As a one-liner, it would be written as:

grep -i 'logged in' ./file | tail | perl -MText::ParseWords -n -e '
  @F = quotewords(q/\s+/, 1, $_);
  $F[1] =~ s/\..*//;
  $F[10] =~ s/,//;
  printf "%s logged in at %s %s as %s from %s:%s\n", @F[5,1,0,7,10,13]'

Note how I changed '/s+' to q/\s+/ - perl has some great quoting operators which can be used to avoid the single-quote inside single-quote problem.

BTW, I just noticed your other question. Ignoring the fact that non- ISO-8601 date formats are just wrong in every possible way, I'd use the perl modules Date::Parse and Date::Format to convert the date to a unix time_t value and then format it however I wanted. — cas, Dec 23 '17 at 04:17

AWK: How to properly display a column which has multiple words and enclosed with quotes?

2 Answers2