file 1:
HOGBRM443983 -2522.00 19800826
HOGBRM445985 -2389.00 19801101
HOUSAM1891409 -1153.00 19811228
HOUSAM2004289 -650.00 19860101
HOUSAM2005991 -843.00 19860109
HOCANM388722 -1546.00 19860116
HOUSAM2007297 -1882.00 19860125
HOUSAM2007389 -1074.00 19860128
HOITAM801038516 -691.00 19860128
Columns 2 and 3 include values and birthdate information( year,month,day) of each id from column1, respectively . I want to check how many ids exist within each birth year and what are the average values (from second column) of ids across different years. For example, in file1 there are 2, 1 and 6 ids across years 1980, 1981 and 1986 respectively so the output should be:
output:
1980 2 -2455.5
1981 1 -1153.00
1986 6 -114.33
in which the first column shows the year of birth, the second column shows a number of ids within each year and the third column is the average values of ids across different years.
Considering that the real data is indeed huge, any suggestion would be appreciated.