0

What I need to do is write a shell program called avgs that would read lines from the file with data, where the title line could be at any line within the data.

I must keep a total and count for each of the last 2 columns and must not include data from the first line in the totals and counts.

This is the file with the data:

92876035 SMITZ  S 15 26
95908659 CHIANG R 10 29
SID      LNAME  I T1/20 T2/30
92735481 BRUCE. R 16 28
93276645  YU    C 17 27
91234987 MYRTH  R 15 16

The shell program will write to stdout the line: "The averages are 17 and 24"

This is what I tried but it doesn't work

count_ppl=0
total=0
while read ?? ?!
do
    total=$((sum+b))
    count_ppl=$((count_ppl+1))
done < filename
avg=$(echo "scale=2;$total/$count_ppl" | bc)
echo "The averages are = $avg"

The "??" and "?!" are there beside the "while read" because I do not know what to put there.

I guess this probably computes one averages for one column, but how would I get the data from the columns and compute two averages.

(this is bash btw).

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232

2 Answers2

3

Not sure what you mean by "and must not include data from the first line in the totals and counts.". Do you mean that the row "92876035 SMITZ S 15 26" must be excluded or just not to 'sum' "SID LNAME I T1/20 T2/30"?

The ?? and ?! needs to be replaced by variable names you need. The last variable name mentioned will keep the remainder of the input. You need the last two columns so in your case there are 5 columns and the while read statement could be:

while read col1 col2 col3 col4 col5

Next you need to check if the line is the title line. In this case I will test for the word SID in the first column:

if [ "$col1" != 'SID' ]

and from here we can start calculating:

totallines=$((totallines+1))
sumcol4=$((sumcol4+col4))
sumcol5=$((sumcol5+col5))

Finally you can calculate the averages using

avgcol4=$(echo "scale=2; $sumcol4/$totallines"|bc)
avgcol5=$(echo "scale=2; $sumcol5/$totallines"|bc)

to wrap this up you can use the following script:

#!/bin/bash
while read col1 col2 col3 col4 col5
do
  if [ "$col1" != 'SID' ]
  then
      totallines=$((totallines+1))
      sumcol4=$((sumcol4+col4))
      sumcol5=$((sumcol5+col5))
  fi
done < /path/to/inputfile
avgcol4=$(echo "scale=2; $sumcol4/$totallines"|bc)
avgcol5=$(echo "scale=2; $sumcol5/$totallines"|bc)
printf "The averages are %s and %s" $avgcol4 $avgcol5

Another way of doing this is to use awk:

awk '{ if ( $1 != "SID" ) { COL4+=$4; COL5+=$5; } } END { LINES=NR-1; printf "The averages are %.2f and %.2f\n", COL4/LINES, COL5/LINES }' < /path/to/inputfile

The above command filters for the title row, otherwise sum column 4 and column 5, and after processing the inputfile it will set the LINES variable to the number of record substracted by 1 (the title row) and prints the output line.

Both the bash and the awk version will output:

The averages are 14.60 and 25.20
Lambert
  • 12,680
0
#!/usr/bin/awk -f

NR == 1     { next }
/^[^0-9]/   { next }

{
    s1 += $(NF - 1)
    s2 += $NF
    ++n;
}

END {
    printf("The averages are %.2f and %.2f\n", s1/n, s2/n)
}

Testing:

$ chmod +x avgs
$ ./avgs file
The averages are 14.50 and 25.00

This uses awk to skip the first line of data (as requested in the question) as well as any line containing a non-digit as the first character.

For all other lines it adds the numbers of the last two fields to the twe sums s1 and s2, and also increments the counter n.

At the end, the result is printed with two decimal places.

As a "one-liner" in the shell:

$ awk 'NR==1||/^[^0-9]/{next} {s1+=$(NF-1);s2+=$NF;++n} END {printf("The averages are %.2f and %.2f\n", s1/n, s2/n)}' file
The averages are 14.50 and 25.00

Related:

Kusalananda
  • 333,661