3

Code is being used to remove duplicate entries from histroy.txt file contains history of command line.

BEGIN{
      if (data[$0]++ == 0)
         lines[++count] = $0;
     }
END {
     for(i=1; i<count; i++)
         print lines[i];
    }

What is data in the code and why it is being compared to 0?

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
A Q
  • 85

1 Answers1

5

This intent to remember uniq line in input.

as Jeff Schaller pointed out, $0 is undefined in BEGIN block.

a more correct code should be

{
      if (data[$0]++ == 0)
         lines[++count] = $0;
     }
END {
     for(i=1; i<count; i++)
         print lines[i];
    }

or even

!data[$0]++ { lines[++count] = $0; }
END {
     for(i=1; i<count; i++)
         print lines[i];
    }

The first time a line appear data[$0] will be equal to 0 and line[ ] will receive the line.

After test, data[$0] will be incermented (++ is a post incrementation) and test will evaluate to false for line with same content.

The END statement print all the line in order.

see also How does awk '!a[$0]++' work?

Archemar
  • 31,554