How to print only the duplicate values from a text file?

Question

Suppose there is a column of numeric values like following:

File1:

I want the output:

3  
4

That is, only the repeated lines. Are there any command line tools to find this out in Linux? (NB: The values are numerically sorted).

Related, more difficult question: Delete duplicate lines pairwise — Wildcard, Oct 26 '19 at 01:55

camh · Accepted Answer · 2020-07-06T22:38:32.573

190

You can use uniq(1) for this if the file is sorted:

uniq -d file.txt

If the file is not sorted, run it through sort(1) first:

sort file.txt | uniq -d

This will print out the duplicates only.

Technically the input does not need to be in sorted order, but the duplicates in the file need to be consecutive. The usual way to achieve that is to sort the file.

edited Jul 06 '20 at 22:38

answered Oct 22 '12 at 05:29

camh

39,069

2

what if I want the triplicates only to be printed? – N. F. Oct 22 '12 at 07:55
12

@MiNdFrEaK sort | uniq -c | grep '^\s*3\s' | sed 's/^\s*[0-9]*\s*//' for triplicates; replace "3" with any N for N-plicates – full.stack.ex Oct 22 '12 at 08:10
1

@MiNdFrEaK sort | uniq -c | sed -n 's/^[[:blank:]]*3[[:blank:]]\{1,\}//p' for triplicates – Oct 22 '12 at 11:03
@camh can you do this on csv files as well? only values of a certain column? – NumenorForLife Jun 04 '15 at 12:20
2

sort file.txt | uniq -d – ron Mar 30 '17 at 08:31
This only works for numbers, not for characters, see my answer for a number and character solution: https://unix.stackexchange.com/a/548813 – jasonleonhard Apr 22 '20 at 16:47
@jasonleonhard uniq will work for any text, not just numbers. All your answer adds to the command is sort, which I mention in my answer: "The input file needs to be sorted ... so run it through sort first if it is not". – camh Apr 22 '20 at 21:09
This is incorrect if file.txt is not sorted. You should type sort file.txt | uniq -d to first sort and then find the non-uniq (uniq -d) lines. – amc Jul 04 '20 at 20:51
@amc I do say in the last sentence that it needs to be sorted. Should I reword that perhaps? – camh Jul 05 '20 at 08:16
@camh I think that is best! Maybe the simplest thing is to modify the first sentence: "You can use uniq(1) only if file.txt is sorted". If you want to be really clear perhaps write the case when the file isn't sorted (as I wrote before). – amc Jul 06 '20 at 20:08

score 9 · Answer 2 · edited Apr 22 '20 at 09:22

9

`uniq` requires your list to be ordered, sort defaults to alphabetical

sort path/to/your/filename | uniq -d

or

cat fileName | sort | uniq -d

edited Apr 22 '20 at 09:22

terdon

242,166

answered Oct 26 '19 at 01:05

jasonleonhard

563

Technically uniq doesn't specifically require them to be sorted, just that the duplicates are ordered consecutively - which sorting guarantees, but in my case was not needed, and I'm only mentioning it because it prevented me from having to buffer the entire input stream to sort it, because I could already guarantee the duplicates were consecutive. – nevelis Jun 20 '23 at 13:21

score 3 · Answer 3 · edited Jul 09 '15 at 19:35

3

Execute this: perl -ne 'print if $a{$_}++' filename.txt

edited Jul 09 '15 at 19:35

don_crissti

82,805

answered Jul 09 '15 at 19:33

Sanjay Nair

41

It gives 3\n3\n4\n\4n for the input File1 which is obviously wrong. – yaegashi Jul 10 '15 at 00:03
the perl snip i find myself revisiting provides the number of incidences of each line so it can be piped, sorted, and filtered as needed: perl -ne '$a{$_}++; END { while(($k,$v)=each %a){printf "%d\t%s", $v,$k}}' filename – Theophrastus Jun 02 '16 at 22:00
Is there a way to do that on a specific column separated by a given field separator? – Geremia Sep 09 '16 at 03:42
1

As indicated by yaegashi, a small fix is needed to fulfill the requirements: perl -ne 'print if 1==$a{$_}++' filename.txt Among all the answer, it is my favorite, because the other answers require to preprocess all the data with a full sort. This answer starts output results more quickly and efficiently. – BOC Jun 14 '19 at 15:07

score 1 · Answer 4 · edited Oct 22 '12 at 17:42

1

Using uniq and awk:

cat File1  | uniq -c | awk '$1 > 1 { print $2 }'

edited Oct 22 '12 at 17:42

manatwork

31,277

answered Oct 22 '12 at 17:24

Ricardo Reyes

111

7

This work, but I don't see why you pipe the output of cat? – Bernhard Oct 22 '12 at 17:52
2

not everyone knows you can do uniq -c File1 and similarly with many other tools. That is probably what is going on here. – Matthias Oct 24 '16 at 17:15
1

Could still redirect uniq -c < File1. In particular, necessary with tr, as it does not process file args. – Paul_Pedant Jul 06 '20 at 23:02

How to print only the duplicate values from a text file?

4 Answers4

uniq requires your list to be ordered, sort defaults to alphabetical

`uniq` requires your list to be ordered, sort defaults to alphabetical