Awk to average column based on key in 2nd column

Question

I have

Sample_A   100
Sample_A   200
Sample_B   300
Sample_B   100

And I want to print the average of the values in row 2 for each key in row 1

Sample_A   150
Sample_B   200

I can print the sums of the values in row 2 for each key in row 1 using the excellent answer to another question: Sum First Column on basis of Second Column

The command is:

awk 'NR { k = $1; cnt[k] += $2 } END { print; for (k in cnt) print k,cnt[k]}' File.txt

And this produces

Sample_A  300
Sample_B  400

But in order to calculate the average, I need a way to save the number of occurrences of the key, something like

awk 'NR { k = $1; cnt[k] += $2; count(k)=$2} END { print; for (k in cnt) print k,cnt[k]/count(k)}' File.txt

But my count(k) code is kind of a shot in the dark and doesn't work.

You're thinking along the right lines, except awk arrays are accessed using square brackets rather than parentheses and you need to increment count rather than assign $2 to it i.e. count[k]++ or count[k] += 1 — steeldriver, Aug 30 '18 at 17:46

αғsнιη · Answer 1 · 2018-08-30T17:48:48.120

4

With awk you could do:

awk '{seen[$1]+=$2; count[$1]++} END{for (x in seen)print x, seen[x]/count[x]}' infile
Sample_A 150
Sample_B 200

Or using GNU datamash:

datamash -t' ' --sort --group 1 mean 2 <infile
Sample_A 150
Sample_B 200

edited Aug 30 '18 at 17:48

answered Aug 30 '18 at 17:39

αғsнιη

41,407

DopeGhoti · Answer 2 · 2018-08-30T17:45:05.747

$ awk '{ sum[$1] += $2; count[$1] += 1 } END { for ( key in count ) { print key, sum[key] / count[key] } }' input
Sample_A 150
Sample_B 200

To derive a mean, you need two things: a count of the number of entities being averaged, and the sum of those values. We use an array, count for the former, and an array sum for the latter. The keys in each array are the assigned to the first column in your data file.

We then use an END clause once the data are being collected to look at each array to get the sums and counts, divide one by the other, and display the results.

The awk script, reformatted to not all be on one line, looks like this:

{ 
  sum[$1] += $2 
  count[$1] += 1
} 
END { 
  for (key in count) { 
    print key, sum[key] / count[key] 
  } 
}

your answer is duplicated of what I posted! – αғsнιη Aug 30 '18 at 17:43 — αғsнιη, Aug 30 '18 at 17:43
Form follows function, my friend. – DopeGhoti Aug 30 '18 at 17:48 — DopeGhoti, Aug 30 '18 at 17:48

Awk to average column based on key in 2nd column

2 Answers2

Linked