uniq
seems to do something different than uniq -u
, even though the description for both is "only unique lines".
What's the difference here, what do they do?
uniq
seems to do something different than uniq -u
, even though the description for both is "only unique lines".
What's the difference here, what do they do?
This ought to be easy to test:
$ cat file
1
2
3
3
4
4
$ uniq file
1
2
3
4
$ uniq -u file
1
2
In short, uniq
with no options removes all but one instance of consecutively duplicated lines. The GNU uniq
manual formulates that as
With no options, matching lines are merged to the first occurrence.
while POSIX says
[...] write one copy of each input line on the output. The second and succeeding copies of repeated adjacent input lines shall not be written.
With the -u
option, it removes all instances of consecutively duplicated lines, and leaves only the lines that were never duplicated. The GNU uniq
manual says
only print unique lines
and POSIX says
Suppress the writing of lines that are repeated in the input.
uniq -u
only contains lines where uniq -c
gives a count of 1.
– cmbuckley
Nov 18 '20 at 20:14
uniq -u
makes the so-often seen sort | uniq
or sort -u
unnecessary, if sorting is not desired. (So, I have learnt something today)
– rexkogitans
Nov 19 '20 at 12:34
sort -u
does not remove all copies of duplicated lines.
– Kusalananda
Nov 19 '20 at 12:47
From uniq(1):
NAME uniq - report or omit repeated lines DESCRIPTION ... With no options, matching lines are merged to the first occurrence. ... -u, --unique only print unique lines
If we try it out we see:
$ cat file
cat
dog
dog
bird
$ uniq file
cat
dog
bird
$ uniq -u file
cat
bird
You can see that uniq
prints the first instance of a duplicated line. uniq -u
does not print any duplicated lines.
Considering the original poster's comment to the accepted answer, I believe that a different example may be useful to illustrate the difference and the point of the commands.
Let's say we have some portion of text, which has lines spaced with duplicate empty lines for some reason and with a single empty line at the beginning and the end:
$ cat declaration_quote.txt
We hold these truths to be self-evident, that all men are created equal, that
they are endowed by their Creator with certain unalienable Rights, that among
these are Life, Liberty and the pursuit of Happiness.
If you decide that one empty line is enough spacing, you can use uniq
to get
It is not "everything only once", but rather "once from each continuous group" because you will receive a separate empty line from each group of the empty lines. That is already more than once. Also, the empty lines in the beginning and the end stay because there are no empty lines immediately above or below.
$ uniq declaration_quote.txt
We hold these truths to be self-evident, that all men are created equal, that
they are endowed by their Creator with certain unalienable Rights, that among
these are Life, Liberty and the pursuit of Happiness.
If you decide that you do not need such double spacing at all, you can use uniq -u
to get only each line which is not repeated immediately in the lines above or below. But it is still not "only things that appear once" because it will not remove the single empty lines (in the beginning and in the end), even though there are many other empty lines in the text. It is rather "only things not repeated immediately".
$ uniq -u declaration_quote.txt
We hold these truths to be self-evident, that all men are created equal, that
they are endowed by their Creator with certain unalienable Rights, that among
these are Life, Liberty and the pursuit of Happiness.
uniq -d
– Grump Nov 19 '20 at 15:48