This calls for Awk. Since the field you want to check is the first field of each line, just reference $1
.
awk -F: '! ($1 in seen) {print; seen[$1]}' users.txt
You can "golf" this to reduce it considerably:
awk -F: '!a[$1]++' users.txt
The longer form is more or less self-explanatory; you build an associative array using each email address as an index, without bothering to assign a value. Then you can just check if the email address has been "seen" before (i.e., if the associative array has a particular email address as an index already), and print the whole line if not.
The shorter form is actually doing more or less the same thing, but requires more explanation for the shorter code.
The postfix ++
operator acts on a variable after the expression is evaluated, so we'll come back to that later.
In Awk, 0 means false and non-zero means true. !
is for negation and reverses the truth value.
Appearing as it does outside of curly brackets, the expression is interpreted as a boolean expression, with an associated action (in curly brackets) to be performed if the expression is true. Since no action is explicitly stated, the default (implicit) action of printing the whole line is used, if the expression evaluates to true (non-zero).
Essentially, this retrieves the value in the associative array a
which is pointed to be the email address (first field) as its index—or creates that value initialized as 0 if not already present, interprets a 0 as false or non-zero as true, inverts this truth value and prints the whole line if the result is "truthy," and then increments the value stored in the associative array at that point.
A common enough Awk idiom, actually, but I wouldn't fault you for using the longer more explicit version. :)
display
value doing there? – Wildcard Mar 01 '17 at 23:53