23

From my understanding, $1 is the first field. But strangely enough, awk '$1=$1' omits extra spaces.

$ echo "$string"
foo    foo bar               bar

$ echo "$string" | awk '$1=$1'
foo foo bar bar

Why is this happening?

Siva
  • 9,077
annahri
  • 2,075

2 Answers2

25

When we assign a value to a field variable ie. value of $1 is assigned to field $1, awk actually rebuilds its $0 by concatenating them with default field delimiter(or OFS) space.

We can get the same case in the following scenarios as well...

echo -e "foo foo\tbar\t\tbar" | awk '$1=$1'
foo foo bar bar

echo -e "foo foo\tbar\t\tbar" | awk -v OFS=',' '$1=$1' foo,foo,bar,bar

echo -e "foo foo\tbar\t\tbar" | awk '$3=1' foo foo 1 bar

For GNU AWK this behavior is documented here:
https://www.gnu.org/software/gawk/manual/html_node/Changing-Fields.html

$1 = $1 # force record to be reconstituted

annahri
  • 2,075
Siva
  • 9,077
  • 14
    Don't rely on awk '$1=$1' printing the current record after recompiling it, try echo -e "0\tbar\t\tbar" | awk '$1=$1'. Always do awk '{$1=$1}1' instead and in general only use an action in a conditional context if you need the result of that action to be evaluated as a condition. The only other thing worth mentioning is that assigning to a field will also remove all leading and/or trailing spaces from the record when you use the default FS. – Ed Morton Feb 20 '20 at 15:22
  • 3
    What does the last 1 signify in awk '{$1=$1}1'? @EdMorton – annahri Feb 21 '20 at 02:29
  • 5
    @annahri it’s a pattern, which always evaluates successfully, and executes the default action (which prints the current record). Adding 1 is a common AWK trick to print the current record, but it does make the program harder to understand for people unfamiliar with the trick in question. – Stephen Kitt Feb 21 '20 at 10:19
18
echo "$string" | awk '$1=$1'

causes AWK to evaluate $1=$1, which assigns the field to itself, and has the side-effect of re-evaluating $0; then AWK considers the value of the expression, and because it’s non-zero and non-empty, it executes the default action, which is to print $0.

The extra spaces are removed when AWK re-evaluates $0: it does so by concatenating all the fields using OFS as a separator, and that’s a single space by default. When AWK parses a record, $0 contains the whole record, as-is, and $1 to $NF contain the fields, without the separators; when any field is assigned to, $0 is reconstructed from the field values.

Whether AWK outputs anything in this example is dependent on the input:

echo "0      0" | awk '$1=$1'

won’t output anything. $1=$1 evaluates to whatever is in the first field, which is 0 in this case; that’s a “false” result in AWK, so nothing happens and nothing is output. To avoid that, turn $1=$1 into an action and make AWK print the current record in all cases:

| awk '{$1=$1}1'

1 causes AWK to always run the default action.

Stephen Kitt
  • 434,908