Is there a way to get the min, max, median, and average of a list of numbers in a single command?

Question

I have a list of numbers in a file, one per line. How can I get the minimum, maximum, median and average values? I want to use the results in a bash script.

Although my immediate situation is for integers, a solution for floating-point numbers would be useful down the line, but a simple integer method is fine.

http://stackoverflow.com/questions/3122442/how-do-i-calculate-the-mean-of-a-column — Ciro Santilli OurBigBook.com, Nov 21 '15 at 11:13

score 97 · Answer 1 · edited Dec 20 '20 at 12:25

97

With GNU datamash:

$ printf '%s\n' 1 2 4 | datamash max 1 min 1 mean 1 median 1
4   1   2.3333333333333 2

edited Dec 20 '20 at 12:25

Kusalananda

333,661

answered May 12 '15 at 04:26

cuonglm

153,898

4

brew install datamash gives you a working version for macOS, if you have Hombrew installed. – Per Lundberg Mar 05 '18 at 12:26
7

Note: the 1 in mean 1 specifies the fld, which "is the input field to use" i.e. it selects the i-th (counting from 1) column of input numbers in a matrix of numbers. – dosentmatter Dec 31 '21 at 20:57

Lesmana · Accepted Answer · 2015-03-03T16:56:38.090

62

You can use the R programming language.

Here is a quick and dirty R script:

#! /usr/bin/env Rscript
d<-scan("stdin", quiet=TRUE)
cat(min(d), max(d), median(d), mean(d), sep="\n")

Note the "stdin" in scan which is a special filename to read from standard input (that means from pipes or redirections).

Now you can redirect your data over stdin to the R script:

$ cat datafile
1
2
4
$ ./mmmm.r < datafile
1
4
2
2.333333

Also works for floating points:

$ cat datafile2
1.1
2.2
4.4
$ ./mmmm.r < datafile2
1.1
4.4
2.2
2.566667

If you don't want to write an R script file you can invoke a true one-liner (with linebreak only for readability) in the command line using Rscript:

$ Rscript -e 'd<-scan("stdin", quiet=TRUE)' \
          -e 'cat(min(d), max(d), median(d), mean(d), sep="\n")' < datafile
1
4
2
2.333333

Read the fine R manuals at http://cran.r-project.org/manuals.html.

Unfortunately the full reference is only available in PDF. Another way to read the reference is by typing ?topicname in the prompt of an interactive R session.

For completeness: there is an R command which outputs all the values you want and more. Unfortunately in a human friendly format which is hard to parse programmatically.

> summary(c(1,2,4))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.500   2.000   2.333   3.000   4.000

edited Mar 03 '15 at 16:56

answered May 25 '11 at 17:07

Lesmana

27,439

1

It looks interesting.. I'll have a closer look at it tomorrow.. Based on wikipedia's page, "R has become a de facto standard among statisticians"... well that's a significant accolade... I actaully tried to dowload it the other day (I kept seeing it mentioned), but I couldn't find it in the Ubuntu repo... I'll follow it up tomorrow... – Peter.O May 25 '11 at 17:26
11

in the ubuntu (and debian?) repo the package is named r-base. – Lesmana May 25 '11 at 17:44
1

thanks, I needed that name reference :) I didn't think of r- in the synaptic search field and it doesn't act on a lone character... I've tried it out now, and it looks ideal.. The R language is clearly the best for my requirement in this situation.. As per Gilles' answer, the Rscript interface to script files is most appropriate (vs. R, which is the interactive interface)... and R in the terminal makes for a handy calculator, or test environment (like python :) – Peter.O May 26 '11 at 11:28
If you have data on stdin, you can use such a one-liner: { echo 'd<-scan()'; cat; echo; echo 'summary(d)'; } | R --slave – Michał Wróbel Sep 02 '13 at 14:19
9

or just cat datafile | Rscript -e 'print(summary(scan("stdin")));' – shabbychef Aug 11 '14 at 22:32
If you want to parse the output of summary(), use tail +2 | awk '{print $N}' where N is the column you want – Neil McGuigan Jul 31 '16 at 06:39
Very nice, actually my very first R script ever run :) – gies0r Feb 18 '20 at 18:53

score 60 · Answer 3 · edited Oct 07 '19 at 04:14

60

I actually keep a little awk program around to give the sum, data count, minimum datum, maximum datum, mean and median of a single column of numeric data (including negative numbers):

#!/bin/sh
sort -n | awk '
  BEGIN {
    c = 0;
    sum = 0;
  }
  $1 ~ /^(\-)?[0-9]*(\.[0-9]*)?$/ {
    a[c++] = $1;
    sum += $1;
  }
  END {
    ave = sum / c;
    if( (c % 2) == 1 ) {
      median = a[ int(c/2) ];
    } else {
      median = ( a[c/2] + a[c/2-1] ) / 2;
    }
    OFS="\t";
    print sum, c, ave, median, a[0], a[c-1];
  }
'

The above script reads from stdin, and prints tab-separated columns of output on a single line.

edited Oct 07 '19 at 04:14

user2966082

3
2

answered May 25 '11 at 18:07

1

Aha! it's obvious (now that I've seen your awk script :) ... There is no need to keep checking for min and max when the array is sorted :) and that means that the NR==1 can go (a useless-use-of-if) along with the min/max checks, so all initializing can be located in the BEGIN section (good!)... Allowing for comments is a nice touch too.. Thanks, +1 ... – Peter.O May 26 '11 at 02:28
Just a thought.. maybe allowing only numerics is better than disallowing comments (but that depends you your requirements).. – Peter.O May 26 '11 at 06:21
2

Technically, awk will assume "new" variables are zero, so in this case the BEGIN{} section is unnecessary. I've fixed the wrapping (no need to escape the line breaks either). I also used OFS="\t" to clean up the print line and implemented @Peter.O's second comment. (Yes, my regex allows ., but as awk interprets that as 0, that's acceptable.) – Adam Katz Jan 15 '15 at 21:22
1

@AdamKatz - these are great changes, but as it stands, I didn't write the program. My awk script is now substantially different. I almost feel like you should take credit for the above program, in order to give credit where credit is due. – Jan 15 '15 at 22:31
1

I wrote a perl script called avg that does this and more, by the way. – Adam Katz Aug 01 '18 at 17:20

nisetama · Answer 4 · 2022-08-31T08:33:01.920

44

Minimum:

jq -s min
awk 'NR==1||$0<x{x=$0}END{print x}'

Maximum:

jq -s max
awk 'NR==1||$0>x{x=$0}END{print x}'

Median:

jq -s 'sort|if length%2==1 then.[length/2|floor]else[.[length/2-1,length/2]]|add/2 end'
sort -n|awk '{a[NR]=$0}END{print(NR%2==1)?a[int(NR/2)+1]:(a[NR/2]+a[NR/2+1])/2}'

Average:

jq -s add/length
awk '{x+=$0}END{print x/NR}'

Combined to one command (modified from a comment):

$ seq 100|jq -s '{minimum:min,maximum:max,average:(add/length),median:(sort|if length%2==1 then.[length/2|floor]else[.[length/2-1,length/2]]|add/2 end)}'
{
  "minimum": 1,
  "maximum": 100,
  "average": 50.5,
  "median": 50.5
}

In jq, the -s (--slurp) option creates an array for the input lines after parsing each line as JSON, or as a number in this case.

Or with R (you can also use R -e instead of Rscript -e but it echoes the commands that it runs to STDOUT):

$ seq 100|Rscript -e 'summary(scan("stdin"))'
Read 100 items
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   1.00   25.75   50.50   50.50   75.25  100.00
$ seq 100|Rscript -e 'x=scan("stdin");sapply(c(min,max,mean,median),\(f)f(x))'
Read 100 items
[1]   1.0 100.0  50.5  50.5
$ seq 100|Rscript -e 'x=scan("stdin",quiet=T);writeLines(paste(sapply(c(min,max,mean,median),\(f)f(x)),collapse=" "))'
1 100 50.5 50.5

edited Aug 31 '22 at 08:33

answered Dec 16 '15 at 19:46

nisetama

1,097

5

The jq solution is worthy of a special mention, since it's succinct, and re-purposes the tool in a non-obvious way. – jplindstrom May 10 '17 at 11:31
3

beautiful! wish i could give +2 – RASG Jul 19 '17 at 21:16
1

Extended a little: jq -s '{ min:min, max:max, sum:add, count:length, avg: (add/length), median: (sort | .[ length/2 ]) }' shows the output as an object with labels, pretty printed with colors! – Grynn Sep 24 '20 at 20:14
1

@Grynn That's not right for median. For an odd list echo '[1,2,3]' | jq 'sort | .[length/2]' your code gives the answer 'null', and for an even list echo '[1,2,3,4]' | jq 'sort | .[length/2]' your code picks the third element '3' but it should give the answer 2.5, the mean of the middle two elements. – Lucian Wischik Nov 12 '20 at 18:14
@LucianWischik - Good point! Probably better to fix this is a public gist, rather than comment stream ... but jq 'sort | .[(length/2) | floor] would work for odd length lists? Cannot think of a very compact way to handle even lists – Grynn Nov 14 '20 at 13:46

score 23 · Answer 5 · answered May 25 '11 at 08:26

23

Min, max and average are pretty easy to get with awk:

% echo -e '6\n2\n4\n3\n1' | awk 'NR == 1 { max=$1; min=$1; sum=0 }
   { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;}
   END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}'
Min: 1  Max: 6  Average: 3,200000

Calculating median is a bit more tricky, since you need to sort numbers and store them all in memory for a while or read them twice (first time to count them, second - to get median value). Here is example which stores all numbers in memory:

% echo -e '6\n2\n4\n3\n1' | sort -n | awk '{arr[NR]=$1}
   END { if (NR%2==1) print arr[(NR+1)/2]; else print (arr[NR/2]+arr[NR/2+1])/2}' 
3

answered May 25 '11 at 08:26

gelraen

6,737

Thanks... your example is a good lead-in to awk, for me.. I've tweaked it a bit and put the two together (getting the feel of awk)... I've used awk's asort rather than the piped sort, and it seems to sort integers and decimals correctly.. Here is a link to my resulting version http://paste.ubuntu.com/612674/ ... (And a note to Kim: I've been experimenting with awk for a couple of hours now. Working with a personal-interest example is way better for me)... A general note to readers: I'm still interested to see other methods. the more compact the better. I'll wait a while ... – Peter.O May 25 '11 at 11:06

RussellStewart · Answer 6 · 2015-05-12T04:15:47.053

20

pythonpy works well for this sort of thing:

cat file.txt | py --ji -l 'min(l), max(l), numpy.median(l), numpy.mean(l)'

edited May 12 '15 at 04:15

answered Sep 13 '14 at 06:41

RussellStewart

1,871

pythonpy no longer exists. Use pythonpy-fork instead – Connor McCormick Mar 21 '21 at 19:59

score 10 · Answer 7 · answered May 27 '15 at 11:07

And a Perl one-(long)liner, including median:

cat numbers.txt \
| perl -M'List::Util qw(sum max min)' -MPOSIX -0777 -a -ne 'printf "%-7s : %d\n"x4, "Min", min(@F), "Max", max(@F), "Average", sum(@F)/@F,  "Median", sum( (sort {$a<=>$b} @F)[ int( $#F/2 ), ceil( $#F/2 ) ] )/2;'

The special options used are:

-0777 : read the whole file at once instead of line by line
-a : autosplit into the @F array

A more readable script version of the same thing would be :

#!/usr/bin/perl

use List::Util qw(sum max min);
use POSIX;

@F=<>;

printf "%-7s : %d\n" x 4,
    "Min", min(@F),
    "Max", max(@F),
    "Average", sum(@F)/@F,
    "Median", sum( (sort {$a<=>$b} @F)[ int( $#F/2 ), ceil( $#F/2 ) ] )/2;

If you want decimals, replace %d with something like %.2f.

score 8 · Answer 8 · edited Mar 31 '14 at 03:50

8

nums=$(<file.txt); 
list=(`for n in $nums; do printf "%015.06f\n" $n; done | sort -n`); 
echo min ${list[0]}; 
echo max ${list[${#list[*]}-1]}; 
echo median ${list[${#list[*]}/2]};

edited Mar 31 '14 at 03:50

slm

369,824

answered Oct 07 '13 at 15:33

NotANumber

81

echo file.txt does not looks quite right, maybe cat – malat Dec 17 '13 at 14:14

Peter.O · Answer 9 · 2011-05-28T08:32:19.330

Just for the sake of having a variety of options presented on this page, Here are two more ways:

1: octave

GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments.

Here is a quick octave example.

octave -q --eval 'A=1:10;
  printf ("# %f\t%f\t%f\t%f\n", min(A), max(A), median(A), mean(A));'  
# 1.000000        10.000000       5.500000        5.500000

2: bash + single-purpose tools.

For bash to handle floating-point numbers, this script uses numprocess and numaverage from package num-utils.

PS. I've also had a reasonable look at bc, but for this particular job, it doesn't offer anything beyond what awk does. It is (as the 'c' in 'bc' states) a calculator—a calculator which requires a much programming as awk and this bash script...

arr=($(sort -n "LIST" |tee >(numaverage 2>/dev/null >stats.avg) ))
cnt=${#arr[@]}; ((cnt==0)) && { echo -e "0\t0\t0\t0\t0"; exit; }
mid=$((cnt/2)); 
if [[ ${cnt#${cnt%?}} == [02468] ]] 
   then med=$( echo -n "${arr[mid-1]}" |numprocess /+${arr[mid]},%2/ )
   else med=${arr[mid]}; 
fi     #  count   min       max           median        average
echo -ne "$cnt\t${arr[0]}\t${arr[cnt-1]}\t$med\t"; cat stats.avg

score 6 · Answer 10 · edited Jun 28 '16 at 09:19

6

Simple-r is the answer:

r summary file.txt
r -e 'min(d); max(d); median(d); mean(d)' file.txt

It uses R environment to simplify statistical analysis.

edited Jun 28 '16 at 09:19

kenorb

20,988

answered Oct 01 '13 at 01:22

user48270

69

score 5 · Answer 11 · edited Jun 28 '16 at 09:23

5

The num is a tiny awk wrapper which exactly does this and more, e.g.

$ echo "1 2 3 4 5 6 7 8 9" | num max
9
$ echo "1 2 3 4 5 6 7 8 9" | num min max median mean
..and so on

it saves you from reinventing the wheel in the ultra-portable awk. The docs are given above, and the direct link here (check also the GitHub page).

edited Jun 28 '16 at 09:23

kenorb

20,988

answered Feb 12 '16 at 06:22

coderofsalvation

179

1

Links to obscured web code to be executed in the user computer seems to me like a bad idea. The site that contains the code resides here – Feb 12 '16 at 06:29

score 4 · Answer 12 · edited Apr 13 '17 at 12:36

4

I'll second lesmana's choice of R and offer my first R program. It reads one number per line on standard input and writes four numbers (min, max, average, median) separated by spaces to standard output.

#!/usr/bin/env Rscript
a <- scan(file("stdin"), c(0), quiet=TRUE);
cat(min(a), max(a), mean(a), median(a), "\n");

edited Apr 13 '17 at 12:36

Community

1

answered May 25 '11 at 22:52

Gilles 'SO- stop being evil'

829,060

Thanks for the "second" (it's reassuring)... your example was useful, as I didn't realize straight-off that R is the interactive interface, and Rscript drives the scripted files, which can be executable as per your example hash-bang, or invoked from within a bash script.. The scripts can handle commandline args (eg. http://stackoverflow.com/questions/2045706/why-my-bash-cant-execute-r-script ) so it's looking good... Also R expressions can be used in bash via the -e ... but I do wonder how R compares to bc ... – Peter.O May 26 '11 at 02:05

score 3 · Answer 13 · answered Mar 29 '18 at 15:55

3

With perl:

$ printf '%s\n' 1 2 4 |
   perl -MList::Util=min,max -MStatistics::Basic=mean,median -w -le '
     chomp(@l = <>); print for min(@l), max(@l), mean(@l), median(@l)'
1
4
2.33
2

answered Mar 29 '18 at 15:55

Stéphane Chazelas

544,893

score 3 · Answer 14 · answered Sep 24 '20 at 20:24

Extending nisetama's answer:

oneliner with jq

jq -s '{ min:min, max:max, sum:add, count:length, avg: (add/length), median: (sort|.[(length/2|floor)])

Example:

echo 1 2 3 4 | jq -s '{ min:min, max:max, sum:add, count:length, avg: (add/length), median: (sort|.[(length/2|floor)]) }'

Gives you:

{
  "min": 1,
  "max": 5,
  "sum": 15,
  "count": 5,
  "avg": 3,
  "median": 3
}

Note: Median is not quite right when the # of items is even, but close enough IMHO.

score 2 · Answer 15 · answered May 14 '15 at 12:59

2

The below sort/awk tandem does it:

sort -n | awk '{a[i++]=$0;s+=$0}END{print a[0],a[i-1],(a[int(i/2)]+a[int((i-1)/2)])/2,s/i}'

(it calculates median as mean of the two central values if value count is even)

answered May 14 '15 at 12:59

mik

1,342

rav · Answer 16 · 2022-12-08T16:14:58.520

2

cat/python only solution - not empty-input proof!

cat data |  python3 -c "import fileinput as FI,statistics as STAT; i = [float(l.strip()) for l in FI.input()]; print('min:', min(i), ' max: ', max(i), ' avg: ', STAT.mean(i), ' median: ', STAT.median(i))"

edited Dec 08 '22 at 16:14

answered Sep 09 '15 at 19:39

rav

121

The statistics module requires python version >= 3.4 – Peter.O Sep 10 '15 at 13:05
@Peter.O you are correct - is that a problem? – rav Sep 10 '15 at 16:17
Its not a problem unless you don't have the appropriate python version. It just make it less portable. – Peter.O Sep 10 '15 at 22:54
2

Why do you use int to convert the number. What if the number is float value? Besides, you need to strip the newline from the number in each line. The correct command is: cat data.log | python3 -c "import fileinput as FI,statistics as STAT; i = [float(l.strip()) for l in FI.input()]; print('min:', min(i), ' max: ', max(i), ' avg: ', STAT.mean(i), ' median: ', STAT.median(i))" – jdhao Jul 13 '20 at 08:11

Rahul Agarwal · Answer 17 · 2015-10-11T04:33:56.713

2

Taking cues from Bruce's code, here is a more efficient implementation which does not keep the whole data in memory. As stated in the question, it assumes that the input file has (at most) one number per line. It counts the lines in the input file that contain a qualifying number and passes the count to the awk command along with (preceding) the sorted data. So, for example, if the file contains

6.0
4.2
8.3
9.5
1.7

then the input to awk is actually

5
1.7
4.2
6.0
8.3
9.5

Then the awk script captures the data count in the NR==1 code block and saves the middle value (or the two middle values, which are averaged to yield the median) when it sees them.

FILENAME="Salaries.csv"

(awk 'BEGIN {c=0} $1 ~ /^[-0-9]*(\.[0-9]*)?$/ {c=c+1;} END {print c;}' "$FILENAME"; \
        sort -n "$FILENAME") | awk '
  BEGIN {
    c = 0
    sum = 0
    med1_loc = 0
    med2_loc = 0
    med1_val = 0
    med2_val = 0
    min = 0
    max = 0
  }

  NR==1 {
    LINES = $1
    # We check whether numlines is even or odd so that we keep only
    # the locations in the array where the median might be.
    if (LINES%2==0) {med1_loc = LINES/2-1; med2_loc = med1_loc+1;}
    if (LINES%2!=0) {med1_loc = med2_loc = (LINES-1)/2;}
  }

  $1 ~ /^[-0-9]*(\.[0-9]*)?$/  &&  NR!=1 {
    # setting min value
    if (c==0) {min = $1;}
    # middle two values in array
    if (c==med1_loc) {med1_val = $1;}
    if (c==med2_loc) {med2_val = $1;}
    c++
    sum += $1
    max = $1
  }
  END {
    ave = sum / c
    median = (med1_val + med2_val ) / 2
    print "sum:" sum
    print "count:" c
    print "mean:" ave
    print "median:" median
    print "min:" min
    print "max:" max
  }
'

edited Oct 11 '15 at 04:33

answered Oct 10 '15 at 01:44

Rahul Agarwal

121
2

Welcome to Unix & Linux! Good job for a first post. (1) While this may answer the question, it would be a better answer if you could explain how/why it does so. The site’s standards have evolved over the past four years; while code-only answers were acceptable in 2011, we now prefer comprehensive answers that provide more explanation and context. I’m not asking you to explain the entire script; just the parts that you changed (but if you want to explain the entire script, that’s OK too). (BTW, *I* understand it fine; I’m asking on behalf of our less experienced users.) … (Cont’d) – G-Man Says 'Reinstate Monica' Oct 10 '15 at 06:18
(Cont’d) … Please do not respond in comments; [edit] your answer to make it clearer and more complete. (2) Fixing the script so that it does not need to hold the entire array in memory is a good improvement, but I’m not sure whether it’s appropriate to say that your version is “more efficient” when you have three unnecessary cat commands; see UUOC. … (Cont’d) – G-Man Says 'Reinstate Monica' Oct 10 '15 at 06:19
(Cont’d) … (3) Your code is safe, since you set FILENAME and you know what you set it to, but, in general, you should always quote shell variables unless you have a good reason not to, and you’re sure you know what you’re doing. (4) Both your answer and Bruce’s ignore negative input (i.e., numbers beginning with -); there is nothing in the question to suggest that this is correct or desired behavior. Don’t feel bad; it’s been over four years, and, apparently, I’m the first person who noticed. – G-Man Says 'Reinstate Monica' Oct 10 '15 at 06:20
Made edits as per suggestions. Didn,t knew about the overhead of cat command. Always used it to stream single files. Thanks for telling me about UUOC..... – Rahul Agarwal Oct 10 '15 at 15:40
Good. I eliminated the third cat and added to the explanation. – G-Man Says 'Reinstate Monica' Oct 10 '15 at 17:10
Hey thanks for removing the third cat. This was my first awk script. And I feel that I learned a lot from this exercise and you. For example I also learned the use of ;(synchronous commands) .........:) – Rahul Agarwal Oct 11 '15 at 04:41
Learning how to shuffle things around to get the same results can be tricky; I’m glad to help in your education. And as I said, it’s a good first effort. Do you want a challenge to try to learn some more? Your answer currently consists of two awks and one sort. It is possible to rearrange the pieces so you can do the same job using *one* awk and a sort (and no other commands). A trivial hint: you would have to merge the two awk scripts into one. Can you figure out how to do it? … (Cont’d) – G-Man Says 'Reinstate Monica' Oct 11 '15 at 08:03
(Cont’d) … BTW, it’s debatable whether this would yield a better answer. We try to minimize the number of processes (commands), but the merged awk script would be more complex and harder to understand, which we try to avoid. – G-Man Says 'Reinstate Monica' Oct 11 '15 at 08:03

score 2 · Answer 18 · answered Jan 06 '21 at 08:32

2

With an R one-liner:

R -q -e 'summary(as.numeric(read.table("your_single_col_file")[,1]))'

For example, for my file, I got such output:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  550.4   628.3   733.1   706.5   778.4   832.9

answered Jan 06 '21 at 08:32

Jing Su

21

David McLaughlin · Answer 19 · 2017-10-09T18:21:59.000

1

function median()
{
    declare -a nums=($(cat))
    printf '%s\n' "${nums[@]}" | sort -n | tail -n $((${#nums[@]} / 2 + 1)) | head -n 1
}

edited Oct 09 '17 at 18:21

answered Oct 09 '17 at 18:16

David McLaughlin

19

1

This answer would be useful if there was an explanation of how the above code answers the question, e.g., you should say that it's using Bash (not sh) as the interpreter. There's also a problem with how the data is read into the array from the file. – Anthony Geoghegan Oct 09 '17 at 19:19

score 0 · Answer 20 · edited Apr 02 '12 at 21:59

If you're more interested in utility rather than being cool or clever, then perl is an easier choice than awk. By and large it will be on every *nix with consistent behaviour, and is easy and free to install on windows. I think it's also less cryptic than awk, and there will be some stats modules you could use if you wanted a halfway house between writing it yourself and something like R. My fairly untested (in fact I know it has bugs but it works for my purposes) perl script took about a minute to write, and I'd guess the only cryptic part would be the while(<>), which is the very useful shorthand, meaning take the file(s) passed as command line arguments, read a line at a time and put that line in the special variable $_. So you could put this in a file called count.pl and run it as perl count.pl myfile. Apart from that it should be painfully obvious what's going on.

$max = 0;
while (<>) {
 $sum = $sum + $_;
 $max = $_ if ($_ > $max);
 $count++;
}
$avg=$sum/$count;
print "$count numbers total=$sum max=$max mean=$avg\n";

You haven't shown the median – Peter.O Mar 28 '12 at 14:31 — Peter.O, Mar 28 '12 at 14:31

score 0 · Answer 21 · answered Jun 03 '21 at 05:18

I wrote a perl script called 'stats' that does this and more. (& you can subselect the bits you want with options like '--sum' '--median', etc'

$ ls -lR | grep $USER| scut -f=4 | stats 
Sum       1.22435e+08
Number    428
Mean      286064
Median    4135
Mode      0
NModes    4
Min       0
Max       8.47087e+07
Range     8.47087e+07
Variance  1.69384e+13
Std_Dev   4.11563e+06
SEM       198936
95% Conf  -103852 to 675979
          [for a normal distribution (ND) - see skew]
Quantiles (5)
        Index   Value
1       85      659
2       171     2196
3       256     11015
4       342     40210
Skew      20.3201
          [Skew=0 for a symmetric dist]
Std_Skew  171.621
Kurtosis  413.679
          [Kurtosis=3 for a ND]
PopKurt   0.975426
          [Pop'n Kurtosis is normalized to sample size; PK=0 for a ND]

It's bundled with scut (a perlish cut/join thingy) at: https://github.com/hjmangalam/scut

score 0 · Answer 22 · answered Jan 04 '22 at 15:53

0

If your list of numbers is short, and you don't need the result programmatically, it's worth noting that sometimes the best move is to convert the column of numbers into an array:

tr '\n' ',' | awk '{printf("a = [%s]\n", $1)}'

Then paste this into your interpreter of choice, e.g., the Python interpreter, and you can calculate min/max/mean/median/mode/etc. as desired.

answered Jan 04 '22 at 15:53

shaneb

101

... but if you are using awk anyway, you can just perform the calculations in the awk program. – AdminBee Jan 05 '22 at 13:23

StackzOfZtuff · Answer 23 · 2023-02-08T12:21:38.423

Try `csvstat` or `xsv stats`

The common CSV toolkits csvkit and xsv include some basic statistics features.

So just pretend that your one-record-per-line input data is a single column of a header-less CSV file.

CSVKIT is older and more well-known, so you can usually easily install it via your package manager of choice. XSV is newer and much faster for big inputs but you may have to install it manually.

Input:

$ echo 1 2 9 9 | tr " " "\n"
1
2
9
9

csvkit's csvstat

csvstat is one of the commands of csvkit.

The default csvstat output is for humans...

$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row
/usr/lib/python2.7/site-packages/agate/table/from_csv.py:74: RuntimeWarning: Error sniffing CSV dialect: Could not determine delimiter
  1. "a"
    Type of data:          Number
    Contains null values:  False
    Unique values:         3
    Smallest value:        1
    Largest value:         9
    Sum:                   21
    Mean:                  5.25
    Median:                5.5
    StDev:                 4.349
    Most common values:    9 (2x)
                           1 (1x)
                           2 (1x)


Row count: 4

...but you can also get output as a CSV itself, which is better further processing:

$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row --csv
/usr/lib/python2.7/site-packages/agate/table/from_csv.py:74: RuntimeWarning: Error sniffing CSV dialect: Could not determine delimiter
column_id,column_name,type,nulls,unique,min,max,sum,mean,median,stdev,len,freq
1,a,Number,False,3,1,9,21,5.25,5.5,4.349,,"9, 1, 2"

csvstat will always complain that the lines do not contain any delimiter. To get rid of that error message just pipe it to /dev/null like so:

$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row --csv 2>/dev/null
column_id,column_name,type,nulls,unique,min,max,sum,mean,median,stdev,len,freq
1,a,Number,False,3,1,9,21,5.25,5.5,4.349,,"9, 1, 2"

And if you want a slightly more human readable version you can pipe the whole thing through csvlook again:

$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row --csv 2>/dev/null | csvlook
| column_id | column_name | type   | nulls | unique |  min | max | sum | mean | median | stdev | len | freq    |
| --------- | ----------- | ------ | ----- | ------ | ---- | --- | --- | ---- | ------ | ----- | --- | ------- |
|      True | a           | Number | False |      3 | True |   9 |  21 | 5.25 |    5.5 | 4.349 |     | 9, 1, 2 |

xsv stats

For speed reasons xsv stats does not include median by default...

$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers
field,type,sum,min,max,min_length,max_length,mean,stddev
0,Integer,21,1,9,1,1,5.25,3.766629793329841

$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers | xsv table
field  type     sum  min  max  min_length  max_length  mean  stddev
0      Integer  21   1    9    1           1           5.25  3.766629793329841

...but you can enable it via the --everything switch. This will give you these three extra columns: median,mode,cardinality:

$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers --everything
field,type,sum,min,max,min_length,max_length,mean,stddev,median,mode,cardinality
0,Integer,21,1,9,1,1,5.25,3.766629793329841,5.5,9,3
$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers --everything | xsv table
field  type     sum  min  max  min_length  max_length  mean  stddev             median  mode  cardinality
0      Integer  21   1    9    1           1           5.25  3.766629793329841  5.5     9     3

Note on non-integer numbers

FYI: non-integers seem to be handled differently by csvkit and xsv:

$ echo 1.1 2.2 9.9 9.9 | tr " " "\n" | csvstat --no-header-row --csv 2>/dev/null | csvcut -c median
median
6.05
$ echo 1.1 2.2 9.9 9.9 | tr " " "\n" | xsv stats --no-headers --everything | xsv select median
median
6.050000000000001

Is there a way to get the min, max, median, and average of a list of numbers in a single command?

23 Answers23

oneliner with jq

Example:

Try `csvstat` or `xsv stats`

csvkit's csvstat

xsv stats

Note on non-integer numbers

Linked

Is there a way to get the min, max, median, and average of a list of numbers in a single command?

23 Answers23

oneliner with jq

Example:

Try csvstat or xsv stats

csvkit's csvstat

xsv stats

Note on non-integer numbers

Linked

Try `csvstat` or `xsv stats`