0

I have a group of output csv files from replicates of a simulation. Each file line in the file follow the same format: generation, number, value1, value2, .... valueX. (the file also include a header, same order.

I would like to calculate the mean and stdev between each cell of each file, and to output another csv file, where the mean would be in the same cell/position of the original files. The stdev can be in either another file in the same cell/position, or after all cells:

generation, number_mean, value1_mean, value2_mean,..., valueX_mean, value1_stdev, value2_stdev,...,valueX_stdev

What would be a good way to do this?

It is important that the output csv file follow the same format as the input files.

Thank you very much.

  • Did you try the solution of my answer? Does it work? – aborruso Nov 13 '21 at 09:56
  • Hey @aborruso, no, I haven't tried it yet. I ended up putting it to the side for now, going for a different approach at the moment, but I will revisit this later on. Thanks for checking back. – jojocad Nov 17 '21 at 21:23

1 Answers1

3

You can use Miller.

In example starting from

a,v1,v2,v3
a,25,56,23
b,58,56,23

you can run merge-fields

mlr --csv merge-fields -a mean,stddev -r "v[0-9]" -o "result" -k input.csv >output.csv

to have

a v1 v2 v3 result_mean result_stddev
a 25 56 23 34.666667 18.502252
b 58 56 23 45.666667 19.655364
aborruso
  • 2,855