How to get max, min, and mean from values in column 4

Question

I have data structured like this

X   43808504    G   1   ^]. <
X   43808505    C   3   .   4
X   43808506    T   8   .   ?
X   43808507    G   5   .   C

I want to get the max (8), min (1), and mean (4.25) from column 4 and write that to a file.

I've been wrestling with sorting and then cutting data away but that seems really inefficient.

Thanks for any help

You might want to take a look at csvsql, unless you require a solution without additional software. — Panki, Dec 20 '19 at 15:29
Why not just use a for loop and do it yourself? I don't know of a way to use sort | cut to get the mean anyway. — Questionmark, Dec 20 '19 at 15:31
awk '{print $4}' but you could do the whole lot in awk pretty trivially — Chris Davies, Dec 20 '19 at 15:31

jesse_b · Answer 1 · 2019-12-20T15:45:34.230

7

Using awk:

awk 'NR == 1 { min = $4; max = $4 }
{
    sum += $4
    if ($4 > max) {
        max = $4
    }
    if ($4 < min) {
        min = $4
    }
} END {
    print max
    print min
    print sum / NR
}' input

First we set the min and max variable as the value of the 4th column in line 1, later we will check each value in column 4 to see if it is less than the current value of min or more than the current value of max, if so set min to that value.

Then we create a sum variable with the sum of all values of column 4. This will later be used to calculate the mean by dividing the sum by the total number of rows.

At the end we print the max, min, and mean.

edited Dec 20 '19 at 15:45

answered Dec 20 '19 at 15:39

jesse_b

37,005

Empty file gets a fatal divide-by-zero. print (NR ? sum / NR : "NaN") – Paul_Pedant Dec 20 '19 at 16:15
6

@Paul_Pedant: Don't use it on an empty file. – jesse_b Dec 20 '19 at 16:16
I wouldn't. But users are a whole new thing. I like your initialising from NR==1. I use BEGIN and some arbitrary max and min, and it always looks clumsy. – Paul_Pedant Dec 20 '19 at 16:19
4

If it's being run against an empty file it should error IMO because something has been done wrong. – jesse_b Dec 20 '19 at 16:20

score 6 · Answer 2 · answered Dec 20 '19 at 16:22

6

With Miller

$ mlr --nidx --repifs stats1 -a 'min,max,mean' -f 4 data
1 8 4.250000

You can redirect the output to a file in the usual way, by adding > file

With GNU datamash

$ datamash -W min 4 max 4 mean 4 < data
1   8   4.25

answered Dec 20 '19 at 16:22

steeldriver

81,074

How to get max, min, and mean from values in column 4

2 Answers2