Calculate column averages over specific row ranges from a separate input file

Question

I'm trying to calculate averages over a given range specified in one file and applying it to numbers in a different file. I am having trouble finding examples in bash where you can use information from 2 separate files at once. Here's what I am trying to do:

The first file has specified ranges that I want averages for:

ranges.txt

Sc0  1  5
Sc1  69 72

The second file contains the numbers I need to take the averages from (using the 3rd column):

allNumbers.txt

Here's what I'd like to have: averages.txt

34
25

I am trying to do this in the bash loop shown below, but I am fairly new to bash scripting and this code is not working.

#!/bin/bash
count=0;
total=0;
while read rangeName rangeStart rangeStop            #make column variables for range.txt 
    while read name position sum                     #make column variables for allNumbers.txt

        while [$rangeName == $name && $rangeStart < $position <= $rangeStop]; do
            for i in $sum; do
                total=$(echo $total+$i | bc)
                ((count++))
            done
            echo "$total / $count" | bc          #print out averages
        done

    done < allNumbers.txt
done < ranges.txt

Can someone help me out with this? Thanks in advance.

Welcome to the site. As a general rule, please always explain how an attempt you describe in your question didn't work as expected, since this can give contributors valuable input on how to help you solve the problem. — AdminBee, Jun 17 '20 at 12:34

terdon · Accepted Answer · 2020-06-17T12:06:52.253

You really don't want to use the shell for this. First because it doesn't do floating point math so you need to call bc or another tool, second because the syntax is very complicated, as you can see, and third because it is slow. See Why is using a shell loop to process text considered bad practice? for more details.

Pretty much any other language would be better, but here's one approach using awk:

$ awk 'NR==FNR{a[$1]["start"]=$2; a[$1]["end"]=$3; next}
       { 
        if($2>=a[$1]["start"] && $2<=a[$1]["end"]){
            values[$1]+=$3; 
            nums[$1]++;
        }
       }
       END{
        for(range in values){
            print values[range]/nums[range]
        }
       }' ranges allNumbers
34
25

And here's the same thing as an annotated script:

#!/bin/awk -f
If we are reading the first file
NR==FNR{
$1 is the range name, so this will save the
start position for this range name as a[$1]["start"] and
the end position as a[$1]["end"]
a[$1]["start"]=$2;
  a[$1]["end"]=$3;
skip to the next line
next
}
This will only run for the second file
{
If this value falls in the relevant range
if($2>=a[$1]["start"] && $2<=a[$1]["end"]){
    ## Sum the values of this range and save
    ## in the values array
    values[$1]+=$3;
    ## Count the number of values for this range and save
    ## in the 'nums' array.
    nums[$1]++;
  }
}
After we've read both files
END{
For each range in the 'values' array
for(range in values){
    ## print the average
    print values[range]/nums[range]
  }
}

You can run either the first one-liner, or save the above as foo.awk, make it executable and run:

foo.awk ranges allNumbers

It did cross my mind about the wisdom of attempting this in the shell, so I suppose it's time to upskill on another language. Regardless, thanks for your help! — mdem7705, Jun 17 '20 at 21:22

Calculate column averages over specific row ranges from a separate input file

1 Answers1

If we are reading the first file

$1 is the range name, so this will save the

start position for this range name as a[$1]["start"] and

the end position as a[$1]["end"]

skip to the next line

This will only run for the second file

If this value falls in the relevant range

After we've read both files

For each range in the 'values' array