From time to time I find myself writing awk scripts that compute some simple statistics. For example computing a histogram, the average of a value, the standard deviation or even the variance ...
Doing that again and again with helper arrays/variables and for-loops in the END
clause etc. feels a little bit tedious and error-prone.
In Dtrace there is a quite awesome syntax for such tasks which they call aggregations. It is similar to the concept/API of Accumulators in the Boost C++ library.
Thus my question: are there awk variants which provide similar concepts/syntax that allow for convenient and iterative computation of such statistics?
An imaginative example of such syntax:
$ someawk '{ @time[$1] = avg($2) }' measurements.log
prog1 150
prog2 200
....
(where the 1st column contains the program name, the 2nd the runtime of one measurement, measurements.log
contains multiple measurements for each program and the aggregate function avg
computes the average)