I have a data looks like this, for each SNP, it should repeat 5 times with different beta. But for SNP rs11704961, it only repeat twice, so I want to delete SNP rows that repeat less than 5 times. I tried to use sort -k 1 | uniq -c
, but it considers the whole line for checking duplicates, not the first column.
SNP R K BETA
rs767249 1 1 0.1065
rs767249 1 2 -0.007243
rs767249 1 3 0.02771
rs767249 1 4 -0.008233
rs767249 1 5 0.05073
rs11704961 2 1 0.2245
rs11704961 2 2 0.009203
rs1041894 3 1 0.1238
rs1041894 3 2 0.002522
rs1041894 3 3 0.01175
rs1041894 3 4 -0.01122
rs1041894 3 5 -0.009195