1

I always have been using sort -u to get rid of duplicates until now.
But I am having a real doubt about a list generated by a software tool.
The question is: is the output of sort -u |wc the same as uniq -u |wc?

Because they don't yield the same results. The manual for uniq specifies:

-u, --unique only print unique lines

My output consists of 1110 words for which sort -u keeps 1020 lines and uniq -u 1110 lines, the correct amount. The issue is that I cannot visually spot any duplicates on the list which is generated by using > at the end of the command line, and that there IS an issue with the total cracked passwords (in the context of customizing john the ripper).

terdon
  • 242,166
Yvain
  • 218
  • 2
  • 9

1 Answers1

4

No, they're not the same. For one, sort would sort the list first; and second, uniq -u prints only those lines that are "unique" in each given run, the ones that don't have an identical input line before or after them.

$ printf "%s\n"  3 3 2 1 2 | sort -u
1
2
3
$ printf "%s\n"  3 3 2 1 2 | uniq -u
2
1
2

See also:

ilkkachu
  • 138,973
  • Thanks for the links, I figured out the difference. It's much easier to look at an output passed through sort |uniq, than raw one for what I'm doing. – Yvain May 30 '22 at 19:28
  • @roaima Are they? Probably. But as soon as -u was added to sort and there was no longer an imperative to pipe to uniq, divergence in output became possible. It made things faster, some may say cleaner, but it's harder to guarantee the consistency, much less prove it. But, -u has to be in sort when using keys (-k), since there's no clean way to tell uniq what they keys were. – Blair Houghton May 30 '22 at 21:41
  • @BlairHoughton I've withdrawn my comment after re-reading https://unix.stackexchange.com/a/76095/100397 – Chris Davies May 31 '22 at 06:40