Delete duplicate entries in a text file

Question

I created a txt file using two requests, one LDAP and one SQL. Results of the two requests are stored in the same txt file.

The txt file looks like this :

user1@domain.fr
user2@domain.fr
user3@domain.fr
user1@domain.fr
user4@domain.fr

Because a user can be in the two databases, I need to delete duplicate entries, using bash.
How can I do it?

score 5 · Accepted Answer · answered Jun 11 '15 at 08:26

5

If you don't mind your file ending up sorted, sort it and filter it; either

sort -u file

if your sort supports it, or

sort file | uniq

if not, and you'll get on standard output the sorted list of unique email addresses.

If you want to keep the addresses in the original order, use awk:

awk '!(count[$0]++)' file

answered Jun 11 '15 at 08:26

Stephen Kitt

sort -u doesn't report the unique line but the first in lines sort the same in current locale. – cuonglm Jun 11 '15 at 08:43
@cuonglm Indeed, but is there a case where two different email addresses would have the same collation? – Stephen Kitt Jun 11 '15 at 08:51
@StephenKitt: ①@example.com and ②@example.com in en_US.utf8 locale. – cuonglm Jun 11 '15 at 09:18
@cuonglm: LC_ALL=en_US.UTF-8; (echo ①@example.com; echo ②@example.com) | sort | uniq also merges both lines, so only the awk solution is viable in that case. – Stephen Kitt Jun 11 '15 at 18:23
@StephenKitt: It seems that you are using GNU uniq, it's not POSIX compliant in this case, you must use uniq -i. – cuonglm Jun 12 '15 at 01:09
Is it able to directly delete entries in the file and then save into it without creating new temporliy file using awk? – CodyChan Jan 08 '16 at 03:18
@CodyChan not directly, awk can't manipulate files in that way. You can combine awk with sponge (from moreutils) to re-write the input file. – Stephen Kitt Jan 08 '16 at 08:23

1 Answers1