6

I came into a level while exploring the bash that consisted of finding the line of text that occurs only once in a certain file.

Why is the output of the sort -u file command different from the output of sort file| uniq -u? Shouldn't they be the same?

  • The title and the body ask two different questions. sort | uniq is the same as sort -u, and sort | uniq -u is explicitly asking for a totally different behaviour; which one do you care about? – Michael Homer Sep 24 '17 at 21:29
  • I edited to make the title and body consistent; revert if that isn't what you meant (otherwise, it's a duplicate of the question linked in the answer). – Michael Homer Sep 24 '17 at 21:38
  • Yes, i misunderstood both commands. I initially thought that sort -u was the same thing as sort | uniq -u. Thank you for your answer :) – andrediasesp Sep 24 '17 at 21:47

1 Answers1

12

sort -u and sort | uniq do produce the same output*: all of the lines in the input, exactly once each, in ascending order. That is the default behaviour of uniq.

uniq -u, on the other hand, asks to:

-u Suppress the writing of lines that are repeated in the input.

This is a very different behaviour: only the lines that do not repeat are outputted. When the file has been sorted first, that means the lines that only appear once in the entire file (which is what you wanted).


* There are some caveats about how sort and uniq consider equality, which Stéphane has noted in this answer to a related question. For the POSIX locale or files in some normalised form, they're identical; for others, there can be distinguishable differences.

Michael Homer
  • 76,565