20

I was using the following command

curl -silent http://api.openstreetmap.org/api/0.6/relation/2919627 http://api.openstreetmap.org/api/0.6/relation/2919628 | grep node | awk '{print $3}' | uniq

when I wondered why uniq wouldn't remove the duplicates. Any idea why ?

  • Necro but with awk you don't need grep or uniq: | awk '/node/&&!dupe[$3]++{print $3}' Also curl (like nearly all Unixy programs) needs two dashes for a long-form option: --silent (or short-form -s) – dave_thompson_085 Jun 12 '20 at 03:10
  • Related: https://wiki.openstreetmap.org/wiki/Relation#OSM_XML – Kusalananda Aug 01 '22 at 19:44

3 Answers3

28

You have to sort the output in order for the uniq command to be able to work. See the man page:

Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).

So you can pipe the output into sort first and then uniq it. Or you can make use of sort's ability to perform the sort and unique all together like so:

$ ...your command... | sort -u

Examples

sort | uniq

$ cat <(seq 5) <(seq 5) | sort | uniq
1
2
3
4
5

sort -u

$ cat <(seq 5) <(seq 5) | sort -u
1
2
3
4
5

Your example

$ curl -silent http://api.openstreetmap.org/api/0.6/relation/2919627 http://api.openstreetmap.org/api/0.6/relation/2919628 \
      | grep node | awk '{print $3}' | sort -u
ref="1828989762"
ref="1829038636"
ref="1829656128"
ref="1865479751"
ref="451116245"
ref="451237910"
ref="451237911"
ref="451237917"
ref="451237920"
ref="451237925"
ref="451237933"
ref="451237934"
ref="451237941"
ref="451237943"
ref="451237945"
ref="451237947"
ref="451237950"
ref="451237953"
slm
  • 369,824
  • and what if I don't want the output to be sorted because my ordering matters? uniq cannot do that? – phil294 Sep 02 '17 at 23:56
  • 1
    @Blauhirn no it cannot. – slm Sep 03 '17 at 01:13
  • It cannot because it would be ineffiecient and unnecessary sometimes. The reason is that uniq looks at the previous and next line to figure out if it should keep the current entry. That makes the command run in O(n) time. If it had to sort the array first, it would be O(n*nlogn), which would not always be necessary, so the command leaves the sorting up to you – Dennis Barzanoff Nov 29 '22 at 11:50
0

Here is a short Python program which handles non-consecutive duplicate lines:

import sys
cache = set()
for line in sys.stdin:
    if line in cache:
        continue
    cache.add(line)
    print(line, end="")

To use, pipe input into this Python script:

$ printf "3\n1\n4\n1\n4\n" | python unique.py
3
1
4
-1

Remove adjacent duplicate lines while retaining line order (that is, without sorting first) using the -u option:

uniq -u input.txt >output.txt

This is a bit odd, because reading the man page or the help it seems like it's saying it will only print the lines of the input file that are not duplicated, but that's not so. The -u option will print lines that are duplicated, but will output them just once, as well as lines that are not duplicated. It's badly worded, and I don't understand why this isn't the default behavior anyway. Odd. But it is what it is.

  • 1
    uniq -u prints a line every time it appears in the input without being immediately preceded or followed by itself. If the input lines are a b a a b a, uniq -u outputs a b b a (i.e. a is only removed when it appears more than once on adjacent lines), which is unlikely to be the desired output here. Sure, the manual (the GNU one, at least) can easily be misread. info uniq (again, for the GNU utility) is a bit less terse. – fra-san Jun 12 '20 at 00:48