I have a text file called data.txt
that contains 2 columns and 80,667 rows. The first 20 rows are, which is representative of all rows:
Column A Column B
ACKR1/CCRL2 12.66
ACKR1/CMKLR1 10.41
ACKR1/CXCR4 13.01
ACKR1/FZD5 10.72
ACKR1/GPR107 10.22
ACKR1/GPR137 10.73
ACKR1/GPR146 11.93
ACKR1/TAPT1 11.24
ACKR1/TPRA1 11.81
ACKR1/XPR1 10.01
ACKR2/ACKR3 9.36
ACKR2/ACKR3 10.48
ACKR2/ACKR3 11.08
ACKR2/ACKR3 11.11
ACKR2/ACKR3 11.38
ACKR2/ACKR3 11.73
ACKR2/ACKR3 12.97
ACKR2/ADGRA2 8.68
ACKR2/ADGRA2 9.03
ACKR2/ADGRA2 9.34
As you can see, there are some duplicate values in column A (e.g. ACKR2/ACKR3, ACKR2/ADGRA2). What I'd like to do is create another file results.txt
that contains the average, standard deviation and standard error of unique values, and how many times they appear in data.txt
. For instance:
Value Avg Stdev StdErr Count
ACKR1/CCRL2 12.66 - - 1
ACKR1/CMKLR1 10.41 - - 1
ACKR1/CXCR4 13.01 - - 1
ACKR1/FZD5 10.72 - - 1
ACKR1/GPR107 10.22 - - 1
ACKR1/GPR137 10.73 - - 1
ACKR1/GPR146 11.93 - - 1
ACKR1/TAPT1 11.24 - - 1
ACKR1/TPRA1 11.81 - - 1
ACKR1/XPR1 10.01 - - 1
ACKR2/ACKR3 11.46 0.84 0.35 7
ACKR2/ADGRA2 8.69 0.33 0.19 3
I am not too phased about getting both the standard deviation and standard error but it'd be nice to get at least one.
I'm a biologist so I'm not too savvy with writing unix scripts. I've really got no idea where to start, any help would be appreciated. Sorry if something like this has been answered before, I searched but could not find anything.
Many thanks!