How can I print the standard deviation of a measurements for each ID using awk?

Question

I need to print the standard deviation of measurements ($2) for each unique ID ($1).

the data looks like this:

Awk is not quite the right language. You may want to use R instead. See http://unix.stackexchange.com/questions/13731/is-there-a-way-to-get-the-min-max-median-and-average-of-a-list-of-numbers-in/13775#13775 — Kusalananda, Jan 11 '17 at 12:46
... or datamash -Ws groupby 1 sstdev 2 < file (change to pstdev if you want the population standard deviation) — steeldriver, Jan 11 '17 at 12:59
In any case, this is essentially a subquestion of Determining averages, stdev, stderror, and counts of values in a list — steeldriver, Jan 11 '17 at 13:16
I should be possible, but the people here may not know the formula, add this to the question. Also explain the unique id bit: is it the sub of values for each id. Add examples. Then show us what you have tried. Tell us if you have to use awk, or can use other tools. — ctrl-alt-delor, Jan 11 '17 at 14:50

score 0 · Accepted Answer · answered Jan 11 '17 at 15:39

R or datamash are probably a better choice!

Following standard deviation definition, we can:

$ cat my-sd
#!/usr/bin/awk -f

    { s[$1]["sum"] += $2 ; 
      n = s[$1]["oco"] ++; 
      v[$1][n]=$2  }

END { for(x in s){ 
         m=s[x]["sum"]/s[x]["oco"]; 
         s1=0;
         for(y in v[x]){
            s1 += (v[x][y]-m)*(v[x][y]-m);}
         print x, sqrt(s1/s[x]["oco"])}
    }

$ my-sd example
101 39.6074
104 44.9691
107 35.6195

How can I print the standard deviation of a measurements for each ID using awk?

1 Answers1

Linked