45

We know that we can get the second column of the line we want from a file using these two techniques:

awk '/WORD/ { print $2 }' filename

or

grep WORD filename| cut -f 2 -d ' '

My questions are:

  • What are the differences between the two commands above?
  • Which one has the best performance?
  • What are the advantages of using awk over using cut, and vice versa?
  • What options does awk give us over cut and vice versa?
Nidal
  • 8,956

3 Answers3

51

The most prominent difference between your two lines would be depending on the input. cut takes a single character in -d as the field delimiter (the default being TAB), and every single occurrence of that character starts a new field. awk, however, is more flexible. The separator is in the FS variable and can be an empty string (every input character makes a separate field), a single character, or a regular expression. The special case of a single space character (the default) means to split on any sequence of whitespace. Also, awk suppresses leading whitespace by default.

Please compare:

$ echo "abc def" | cut -f 2 -d ' '
def
$ echo "abc    def" | cut -f 2 -d ' '

$ echo " abc def" | cut -f 2 -d ' '
abc


$ echo "abc def" | awk '{ print $2 }'
def
$ echo "abc    def" | awk '{ print $2 }'
def
$ echo " abc def" | awk '{ print $2 }'
def

Here, awk splits on the sequence of spaces between abc and def whereas cut takes every space as a separator.

What you take would depend on what you want to achieve. Otherwise, I would expect cut to be faster since it is a smaller, single purpose tool whereas awk has its own programming language.

Dubu
  • 3,723
  • 5
    cut is likely to be faster than Awk alone, but it's not so certain that grep ... | cut will be faster than pure Awk. – Wildcard Nov 15 '16 at 07:25
15

Generally speaking, the more specialized a tool is, the faster it is. So in most cases, you can expect cut and grep to be faster than sed, and sed to be faster than awk. If you're compairing longer pipelines of simpler tools with a single invocation of a more complex tool, there's no rule of thumb. This only matters with large inputs (say, millions of lines); for short inputs, you won't see any difference.

The advantage of more complex tools is of course that they can do more things.

Your commands use cat needlessly. Use redirection instead (especially if you're worried about speed, though you probably shouldn't be worried about speed until you've run benchmarks¹).

<fileName awk '/WORD/ { print $2 }'
<fileName grep WORD | cut -f 2 -d ' '

These commands are almost equivalent. The differences are:

  • awk and grep have different regexp syntaxes. Awk and grep -E have almost identical regexp syntaxes (extended regular expressions).
  • cut -d ' ' treats each individual space character as a delimiter. Awk's default delimiter is any whitespace sequence, which can be multiple spaces, a tab, etc. You cannot use arbitrary whitespace sequences as separators with cut. To use individual spaces as separators in awk, set the field separator to a regexp that matches a single space, other than a regexp consisting of single space (which is a special case meaning “any whitespace sequence”, i.e. the default): awk -F '[ ]' '/WORD/ {print $2}'.

¹ The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.Michael A. Jackson

1

Your command,

cat fileName | awk '/WORD/ { print $2 }'

You don't even need a cat command. You may try,

awk '/WORD/ { print $2 }' filename

And the below command redirects the output from cat to grep then to cut,

cat fileName | grep WORD | cut -f 2 -d ' '

Most probably we must avoid output redirection. Awk does the job in one-line but cut needs a grep command to get only the lines which contain particular word and it prints the column 2 according to the delimiter space.

You can do the things in awk if cut fails to do.

Avinash Raj
  • 3,703