10

I know there are "sort" and "uniq" out there, however, today's question is about how to utilise AWK to do that kind of a job. Say if I have a list of anything really (ips, names, or numbers) and I want to sort them;

Here is an example I am taking the IP numbers from a mail log:

awk 'match($0,/\[[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+\]/) { if ( NF == 8 && $6 == "connect" ) {print substr($0, RSTART+1,RLENGTH-2)} }' maillog

Is it possible to sort them, ips, "on the go" within the same awk command? I do not require a complete answer to my question but some hints where to start.

Cheers!

Peter
  • 199

2 Answers2

16

To sort you can use a pipe also inside of an awk command, as in:

awk '{ print ... | "sort ..." }'

The syntax means that all respective lines of the data file will be passed to the same instance of sort.

Of course you can also do that equivalently on shell level:

awk '{ print ... }' | sort ...

Or you can use GNU awk which has a couple sort functions natively defined.

The uniq is in awk typically accomplished by saving the "unique data element or key" in an associative array and checking whether new data need to be memorized. One example to illustrate:

awk '!a[$0]++'

This means: If the current line is not in the array then the condition is true and the default action to print the line triggered. Subsequent lines with the same data will result in a false condition and the data will not be printed.

Janis
  • 14,222
  • 4
    Strictly speaking, !a[$0]++ is not equal to uniq, because uniq requires input data sorted. You need awk 'l != $0 {l = $0}'. – cuonglm Mar 30 '15 at 08:53
  • 4
    No, it's of course not equal. uniq has a couple behaviours that you can choose depending on the options. But the OP asked for something "to do that kind of a job"; i.e. how to accomplish that in principle with awk. The method I presented is in this respect more powerful since its logic operates over all data in the file. This is on Unix typically reflected by some-process | sort | uniq, or (with a different semantic) by some-process | sort -u. Again, this is also not "equal", but you rarely want to imitate a Unix command, but rather solve a conncrete task. – Janis Mar 30 '15 at 08:59
  • @Janis, is there a way to set the field separator used by sort? I think I'm running into a quoting issue using this code: awk 'NR=1; {print $0 | "sort -grk6 -t $'\t'"}' <filename>, which gives the error sort: option requires an argument -- 't'. Thanks. – Josh Apr 07 '20 at 14:53
  • so the answer is no? note: for me sort -u doesn't work on the latest version of git for windows. awk ... | sort -u error -uThe system cannot find the file specified. If the answer is yes, then I'll be honest, I don't think that this answer explains it very well at all. – xenoterracide Mar 16 '21 at 17:33
  • also, I actually need to unique on only the combination of some columns, and then I'm printing multiple lines some of which aren't unique. – xenoterracide Mar 16 '21 at 19:24
-1

It works for me if you use double quotation mark

print substr(a[1],1,5) | "sort -u" # unique values

Victor
  • 1