Using sort
As Ed states in his comment, your sort command is sorting on the third field, when in fact you only have two fields (the : is the field separator). So to fix it, replace 3 with 2 for the key.
However, then the original record order in the source file gets messed up, when the records being sorted by their key value rather than by the line/record number:
$ sort -u -t':' -k2,2 test.txt
1:A
2:B
6:C
5:a
4:b
$
Which is probably not what you want. Nevertheless, this is easily fixed by piping the output through sort again:
$ sort -u -t':' -k2,2 test.txt | sort
1:A
2:B
4:b
5:a
6:C
$
Note: As you say that you have a large file then, in order to speed things up, you may want to consider using the --parallel flag1:
sort --parallel=<n> -u -t':' -k2,2 test.txt | sort --parallel=<n>
When <n> is the number of cores that you have available.
Using awk
Expanding upon your example file, if the original data is in a file called test.txt, like this:
1:A
2:B
3:A
4:b
5:a
6:C
and, again, treating the : as a field separator, then you could use awk2.
For example this line:
awk 'BEGIN{FS=":"}{if (!seen[$2]++)print $0}' test.txt
Gives the following result:
$ awk 'BEGIN{FS=":"}{if (!seen[$2]++)print $0}' test.txt
1:A
2:B
4:b
5:a
6:C
$
You can see how this works by looking at the logic, using
$ awk 'BEGIN{FS=":"}{print !seen[$2]++}' test.txt
1
1
0
1
1
1
$
- First, the field separator is specified with
FS=":".
- Second, the negation operator gives a "true" result for a second field entry that hasn't yet been seen.
- Finally, the
print $0 prints the whole record, i.e. the current line.
Putting this into a shell script3 rather than an awk script gives:
#!/bin/sh
awk -F':' '
(!seen[$2]++) {
print $0
}
' "$1"
References:
1 This answer to How to sort big files?
2 This answer to Keeping unique rows based on information from 2 of three columns
3 This answer to Specify other flags in awk script header
-k3,3you're telling sort to sort by the 3rd field when the input has 2 fields. – Ed Morton Sep 05 '21 at 02:20