Using awk
In your examples, the first six characters are followed by a period. If that is always true, then:
$ awk -F. '!c[$1]++' File
AAAPOL.0001
AAAPRO.0001
AAAXEL.0002
AAAJOK.1111
This works by using .
as a field separator and keeping track of the number of times that the first field has appeared already.
If that is not the case, then:
$ awk '!c[substr($0, 1, 6)]++' File
AAAPOL.0001
AAAPRO.0001
AAAXEL.0002
AAAJOK.1111
substr($0, 1, 6)
is the first six characters of the line. Associative array c
keeps track of the number of times that we have seen those first six characters. Thus, if c[substr($0, 1, 6)]
is non-zero, we have already seen those characters and the line should not be printed. In awk, non-zero means true. So, we invert the test with !
: this means that !c[substr($0, 1, 6)]
is true if those six characters have not been seen before. The trailing ++
updates the count in c
before we read the next line.
Using uniq
For reference for those who, unlike the OP, have access to a version of uniq
with the -w
option, then:
$ uniq -w6 File
AAAPOL.0001
AAAPRO.0001
AAAXEL.0002
AAAJOK.1111