Uniq won't remove duplicate

Question

I was using the following command

curl -silent http://api.openstreetmap.org/api/0.6/relation/2919627 http://api.openstreetmap.org/api/0.6/relation/2919628 | grep node | awk '{print $3}' | uniq

when I wondered why uniq wouldn't remove the duplicates. Any idea why ?

Necro but with awk you don't need grep or uniq: | awk '/node/&&!dupe[$3]++{print $3}' Also curl (like nearly all Unixy programs) needs two dashes for a long-form option: --silent (or short-form -s) — dave_thompson_085, Jun 12 '20 at 03:10
Related: https://wiki.openstreetmap.org/wiki/Relation#OSM_XML — Kusalananda, Aug 01 '22 at 19:44

slm · Accepted Answer · 2014-02-08T02:57:47.110

28

You have to sort the output in order for the uniq command to be able to work. See the man page:

Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).

So you can pipe the output into sort first and then uniq it. Or you can make use of sort's ability to perform the sort and unique all together like so:

$ ...your command... | sort -u

Examples

sort | uniq

$ cat <(seq 5) <(seq 5) | sort | uniq
1
2
3
4
5

sort -u

$ cat <(seq 5) <(seq 5) | sort -u
1
2
3
4
5

Your example

$ curl -silent http://api.openstreetmap.org/api/0.6/relation/2919627 http://api.openstreetmap.org/api/0.6/relation/2919628 \
      | grep node | awk '{print $3}' | sort -u
ref="1828989762"
ref="1829038636"
ref="1829656128"
ref="1865479751"
ref="451116245"
ref="451237910"
ref="451237911"
ref="451237917"
ref="451237920"
ref="451237925"
ref="451237933"
ref="451237934"
ref="451237941"
ref="451237943"
ref="451237945"
ref="451237947"
ref="451237950"
ref="451237953"

edited Feb 08 '14 at 02:57

answered Feb 08 '14 at 02:51

slm

369,824

and what if I don't want the output to be sorted because my ordering matters? uniq cannot do that? – phil294 Sep 02 '17 at 23:56
1

@Blauhirn no it cannot. – slm Sep 03 '17 at 01:13
It cannot because it would be ineffiecient and unnecessary sometimes. The reason is that uniq looks at the previous and next line to figure out if it should keep the current entry. That makes the command run in O(n) time. If it had to sort the array first, it would be O(n*nlogn), which would not always be necessary, so the command leaves the sorting up to you – Dennis Barzanoff Nov 29 '22 at 11:50

score 0 · Answer 2 · answered Aug 01 '22 at 19:29

Here is a short Python program which handles non-consecutive duplicate lines:

import sys
cache = set()
for line in sys.stdin:
    if line in cache:
        continue
    cache.add(line)
    print(line, end="")

To use, pipe input into this Python script:

$ printf "3\n1\n4\n1\n4\n" | python unique.py
3
1
4

Miriam English · Answer 3 · 2020-06-12T00:01:22.080

-1

Remove adjacent duplicate lines while retaining line order (that is, without sorting first) using the -u option:

uniq -u input.txt >output.txt

This is a bit odd, because reading the man page or the help it seems like it's saying it will only print the lines of the input file that are not duplicated, but that's not so. The -u option will print lines that are duplicated, but will output them just once, as well as lines that are not duplicated. It's badly worded, and I don't understand why this isn't the default behavior anyway. Odd. But it is what it is.

edited Jun 12 '20 at 00:01

answered Jun 11 '20 at 23:56

Miriam English

101

1

uniq -u prints a line every time it appears in the input without being immediately preceded or followed by itself. If the input lines are a b a a b a, uniq -u outputs a b b a (i.e. a is only removed when it appears more than once on adjacent lines), which is unlikely to be the desired output here. Sure, the manual (the GNU one, at least) can easily be misread. info uniq (again, for the GNU utility) is a bit less terse. – fra-san Jun 12 '20 at 00:48

Uniq won't remove duplicate

3 Answers3

Examples

Your example

Linked

Related