I have recently come across an awk seen option. I can see it is removing duplicates in files. I could use some clarification on how it works.
cat tes
1
2
3
1
1
1
3
4
with awk seen output
cat tes | awk '!seen[$0]++'
1
2
3
4
I have recently come across an awk seen option. I can see it is removing duplicates in files. I could use some clarification on how it works.
cat tes
1
2
3
1
1
1
3
4
with awk seen output
cat tes | awk '!seen[$0]++'
1
2
3
4
seen
is the arbitrary name of an associative array. It is not an option of any kind. You could use a
or b
or most other names in its place.
The code !seen[$0]++
consists of a test and an increment.
If seen[$0]
, i.e. the value of the array element associated with the key $0
, the current line of input, is zero (or empty), then the boolean value of !seen[$0]
is true.
The value in the array corresponding to the key $0
is then incremented, which means that the test will be false all other times that the same value of $0
is found.
The effect is that the test is true the first time a particular line is seen in the input, and false all other times.
Whenever the a test with no associated action is true, the default action is triggered. The default action is the equivalent of { print }
or { print $0 }
, which prints the current record, which for all accounts and purposes in this example is the current unmodified line of input.