2

I have recently come across an awk seen option. I can see it is removing duplicates in files. I could use some clarification on how it works.

cat tes
1
2
3
1
1
1
3
4

with awk seen output

cat tes | awk '!seen[$0]++'

1 2 3 4

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
ajay
  • 59

1 Answers1

10

seen is the arbitrary name of an associative array. It is not an option of any kind. You could use a or b or most other names in its place.

The code !seen[$0]++ consists of a test and an increment.

If seen[$0], i.e. the value of the array element associated with the key $0, the current line of input, is zero (or empty), then the boolean value of !seen[$0] is true.

The value in the array corresponding to the key $0 is then incremented, which means that the test will be false all other times that the same value of $0 is found.

The effect is that the test is true the first time a particular line is seen in the input, and false all other times.

Whenever the a test with no associated action is true, the default action is triggered. The default action is the equivalent of { print } or { print $0 }, which prints the current record, which for all accounts and purposes in this example is the current unmodified line of input.

Kusalananda
  • 333,661
  • 1
    Changing the $0 can extend the capabilities of this expression. For example, indexing by $5 lists lines with at most one instance of field 5, indexing by $2 FS $4 is unique on two non-adjacent fields. – Paul_Pedant Aug 13 '20 at 10:15