1

I'm trying to calculate averages over a given range specified in one file and applying it to numbers in a different file. I am having trouble finding examples in bash where you can use information from 2 separate files at once. Here's what I am trying to do:

The first file has specified ranges that I want averages for:

ranges.txt

Sc0  1  5
Sc1  69 72

The second file contains the numbers I need to take the averages from (using the 3rd column):

allNumbers.txt

Sc0 1   30
Sc0 2   40
Sc0 3   40
Sc0 4   50
Sc0 5   10
Sc0 6   30
Sc1 69  40
Sc1 70  10
Sc1 71  20
Sc1 72  30

Here's what I'd like to have: averages.txt

34
25

I am trying to do this in the bash loop shown below, but I am fairly new to bash scripting and this code is not working.

#!/bin/bash

count=0; total=0;

while read rangeName rangeStart rangeStop #make column variables for range.txt while read name position sum #make column variables for allNumbers.txt
while [$rangeName == $name && $rangeStart < $position <= $rangeStop]; do for i in $sum; do total=$(echo $total+$i | bc) ((count++)) done echo "$total / $count" | bc #print out averages done
done < allNumbers.txt done < ranges.txt

Can someone help me out with this? Thanks in advance.

  • Welcome to the site. As a general rule, please always explain how an attempt you describe in your question didn't work as expected, since this can give contributors valuable input on how to help you solve the problem. – AdminBee Jun 17 '20 at 12:34
  • Okay, I will do that next time.Thanks for letting me know. – mdem7705 Jun 17 '20 at 21:22

1 Answers1

4

You really don't want to use the shell for this. First because it doesn't do floating point math so you need to call bc or another tool, second because the syntax is very complicated, as you can see, and third because it is slow. See Why is using a shell loop to process text considered bad practice? for more details.

Pretty much any other language would be better, but here's one approach using awk:

$ awk 'NR==FNR{a[$1]["start"]=$2; a[$1]["end"]=$3; next}
       { 
        if($2>=a[$1]["start"] && $2<=a[$1]["end"]){
            values[$1]+=$3; 
            nums[$1]++;
        }
       }
       END{
        for(range in values){
            print values[range]/nums[range]
        }
       }' ranges allNumbers
34
25

And here's the same thing as an annotated script:

#!/bin/awk -f

If we are reading the first file

NR==FNR{

$1 is the range name, so this will save the

start position for this range name as a[$1]["start"] and

the end position as a[$1]["end"]

a[$1]["start"]=$2; a[$1]["end"]=$3;

skip to the next line

next }

This will only run for the second file

{

If this value falls in the relevant range

if($2>=a[$1]["start"] && $2<=a[$1]["end"]){ ## Sum the values of this range and save ## in the values array values[$1]+=$3; ## Count the number of values for this range and save ## in the 'nums' array. nums[$1]++; } }

After we've read both files

END{

For each range in the 'values' array

for(range in values){ ## print the average print values[range]/nums[range] } }

You can run either the first one-liner, or save the above as foo.awk, make it executable and run:

foo.awk ranges allNumbers 
terdon
  • 242,166
  • It did cross my mind about the wisdom of attempting this in the shell, so I suppose it's time to upskill on another language. Regardless, thanks for your help! – mdem7705 Jun 17 '20 at 21:22