119

Suppose there is a column of numeric values like following:

File1:

1 
2
3
3
3
4
4
4
5
6

I want the output:

3  
4

That is, only the repeated lines. Are there any command line tools to find this out in Linux? (NB: The values are numerically sorted).

N. F.
  • 2,209

4 Answers4

190

You can use uniq(1) for this if the file is sorted:

uniq -d file.txt

If the file is not sorted, run it through sort(1) first:

sort file.txt | uniq -d

This will print out the duplicates only.

Technically the input does not need to be in sorted order, but the duplicates in the file need to be consecutive. The usual way to achieve that is to sort the file.

camh
  • 39,069
  • 2
    what if I want the triplicates only to be printed? – N. F. Oct 22 '12 at 07:55
  • 12
    @MiNdFrEaK sort | uniq -c | grep '^\s*3\s' | sed 's/^\s*[0-9]*\s*//' for triplicates; replace "3" with any N for N-plicates – full.stack.ex Oct 22 '12 at 08:10
  • 1
    @MiNdFrEaK sort | uniq -c | sed -n 's/^[[:blank:]]*3[[:blank:]]\{1,\}//p' for triplicates –  Oct 22 '12 at 11:03
  • @camh can you do this on csv files as well? only values of a certain column? – NumenorForLife Jun 04 '15 at 12:20
  • 2
    sort file.txt | uniq -d – ron Mar 30 '17 at 08:31
  • This only works for numbers, not for characters, see my answer for a number and character solution: https://unix.stackexchange.com/a/548813 – jasonleonhard Apr 22 '20 at 16:47
  • @jasonleonhard uniq will work for any text, not just numbers. All your answer adds to the command is sort, which I mention in my answer: "The input file needs to be sorted ... so run it through sort first if it is not". – camh Apr 22 '20 at 21:09
  • This is incorrect if file.txt is not sorted. You should type sort file.txt | uniq -d to first sort and then find the non-uniq (uniq -d) lines. – amc Jul 04 '20 at 20:51
  • @amc I do say in the last sentence that it needs to be sorted. Should I reword that perhaps? – camh Jul 05 '20 at 08:16
  • @camh I think that is best! Maybe the simplest thing is to modify the first sentence: "You can use uniq(1) only if file.txt is sorted". If you want to be really clear perhaps write the case when the file isn't sorted (as I wrote before). – amc Jul 06 '20 at 20:08
9

uniq requires your list to be ordered, sort defaults to alphabetical

sort path/to/your/filename | uniq -d

or

cat fileName | sort | uniq -d

terdon
  • 242,166
  • Technically uniq doesn't specifically require them to be sorted, just that the duplicates are ordered consecutively - which sorting guarantees, but in my case was not needed, and I'm only mentioning it because it prevented me from having to buffer the entire input stream to sort it, because I could already guarantee the duplicates were consecutive. – nevelis Jun 20 '23 at 13:21
3

Execute this: perl -ne 'print if $a{$_}++' filename.txt

don_crissti
  • 82,805
  • It gives 3\n3\n4\n\4n for the input File1 which is obviously wrong. – yaegashi Jul 10 '15 at 00:03
  • the perl snip i find myself revisiting provides the number of incidences of each line so it can be piped, sorted, and filtered as needed: perl -ne '$a{$_}++; END { while(($k,$v)=each %a){printf "%d\t%s", $v,$k}}' filename – Theophrastus Jun 02 '16 at 22:00
  • Is there a way to do that on a specific column separated by a given field separator? – Geremia Sep 09 '16 at 03:42
  • 1
    As indicated by yaegashi, a small fix is needed to fulfill the requirements: perl -ne 'print if 1==$a{$_}++' filename.txt Among all the answer, it is my favorite, because the other answers require to preprocess all the data with a full sort. This answer starts output results more quickly and efficiently. – BOC Jun 14 '19 at 15:07
1

Using uniq and awk:

cat File1  | uniq -c | awk '$1 > 1 { print $2 }'
manatwork
  • 31,277