Try csvstat
or xsv stats
The common CSV toolkits csvkit
and xsv
include some basic statistics features.
So just pretend that your one-record-per-line input data is a single column of a header-less CSV file.
CSVKIT is older and more well-known, so you can usually easily install it via your package manager of choice. XSV is newer and much faster for big inputs but you may have to install it manually.
Input:
$ echo 1 2 9 9 | tr " " "\n"
1
2
9
9
csvkit's csvstat
csvstat
is one of the commands of csvkit.
The default csvstat output is for humans...
$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row
/usr/lib/python2.7/site-packages/agate/table/from_csv.py:74: RuntimeWarning: Error sniffing CSV dialect: Could not determine delimiter
1. "a"
Type of data: Number
Contains null values: False
Unique values: 3
Smallest value: 1
Largest value: 9
Sum: 21
Mean: 5.25
Median: 5.5
StDev: 4.349
Most common values: 9 (2x)
1 (1x)
2 (1x)
Row count: 4
...but you can also get output as a CSV itself, which is better further processing:
$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row --csv
/usr/lib/python2.7/site-packages/agate/table/from_csv.py:74: RuntimeWarning: Error sniffing CSV dialect: Could not determine delimiter
column_id,column_name,type,nulls,unique,min,max,sum,mean,median,stdev,len,freq
1,a,Number,False,3,1,9,21,5.25,5.5,4.349,,"9, 1, 2"
csvstat will always complain that the lines do not contain any delimiter. To get rid of that error message just pipe it to /dev/null like so:
$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row --csv 2>/dev/null
column_id,column_name,type,nulls,unique,min,max,sum,mean,median,stdev,len,freq
1,a,Number,False,3,1,9,21,5.25,5.5,4.349,,"9, 1, 2"
And if you want a slightly more human readable version you can pipe the whole thing through csvlook
again:
$ echo 1 2 9 9 | tr " " "\n" | csvstat --no-header-row --csv 2>/dev/null | csvlook
| column_id | column_name | type | nulls | unique | min | max | sum | mean | median | stdev | len | freq |
| --------- | ----------- | ------ | ----- | ------ | ---- | --- | --- | ---- | ------ | ----- | --- | ------- |
| True | a | Number | False | 3 | True | 9 | 21 | 5.25 | 5.5 | 4.349 | | 9, 1, 2 |
xsv stats
For speed reasons xsv stats
does not include median by default...
$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers
field,type,sum,min,max,min_length,max_length,mean,stddev
0,Integer,21,1,9,1,1,5.25,3.766629793329841
$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers | xsv table
field type sum min max min_length max_length mean stddev
0 Integer 21 1 9 1 1 5.25 3.766629793329841
...but you can enable it via the --everything
switch. This will give you these three extra columns: median,mode,cardinality
:
$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers --everything
field,type,sum,min,max,min_length,max_length,mean,stddev,median,mode,cardinality
0,Integer,21,1,9,1,1,5.25,3.766629793329841,5.5,9,3
$ echo 1 2 9 9 | tr " " "\n" | xsv stats --no-headers --everything | xsv table
field type sum min max min_length max_length mean stddev median mode cardinality
0 Integer 21 1 9 1 1 5.25 3.766629793329841 5.5 9 3
Note on non-integer numbers
FYI: non-integers seem to be handled differently by csvkit and xsv:
$ echo 1.1 2.2 9.9 9.9 | tr " " "\n" | csvstat --no-header-row --csv 2>/dev/null | csvcut -c median
median
6.05
$ echo 1.1 2.2 9.9 9.9 | tr " " "\n" | xsv stats --no-headers --everything | xsv select median
median
6.050000000000001