With the GNU or ast-open implementation of uniq
:
uniq -D -u < input
(-D
itself is non-standard), though note that it's the last duplicate that it removes, not the first (which makes a difference if you also use -i
, -w
or -f
)
Portably, you could always use awk
:
awk 'NR > 1 && $0 "" == previous ""; {previous = $0}' < input
(the concatenation with ""
being to force a string comparison even if operands look like numbers)
To only compare the first 9 characters (note that -w
is also a GNU extension and (currently) works with bytes, not characters (despite what its document says)):
awk '{current = substr($0, 1, 9)}
NR > 1 && current == previous
{previous = current}' < input
(no need for ""
concatenation in that case as substr()
returns a string).
In a UTF-8 locale, on the output of
printf '%s\n' StéphaneChazelas StéphaneUNIX StéphaneUnix
It gives StéphaneUnix
as expected while uniq -w9 -D -u
(with GNU uniq
) gives StéphaneChazelas
and StéphaneUNIX
as Stéphane
is 8 characters but 9 bytes in UTF-8 whilst ast-open uniq
gives StéphaneUNIX only (awk
skips the first occurrence, uniq
removes the last occurrence).
With awk
, you can also report all duplicate lines even when they're not adjacent with:
awk 'seen[$0]++' < input
(note that it stores all the unique lines in memory in a hash table though).
Or to consider only the first 9 characters:
awk 'seen[substr($0, 1, 9)]++' < input
--check-chars
option as well, so something likesubstr($0,9) == substr(previous,9)
I guess (although that doesn't match anything for some reason) – Michael Nov 07 '19 at 22:25substr($0,1,9) == substr(previous,1,9)
works. – Michael Nov 07 '19 at 22:28