0

Can you help me please? I have a task. I have from input some text with numbers. For example:

beta     1
score   9
something   2
beta     4
something   1

I need to calculate all numbers with the same text around. And my output will be:(in this way with ":")

beta:5
something:3
score:9

Also it can be problem with temp files, where i can save my scores. And I need to use mktemp after script finished to delete it. Help me please, thanks.

  • 1
    Which part of this is it that you have issues with? Without knowing what you think is hard, it is difficult to give a helpful answer. I don't quite understand what you mean when you talk about temporary files. mktemp creates a file, it does not delete anything. Also, it seems that there is no need for temporary files to solve this exercise. – Kusalananda Apr 02 '22 at 17:25
  • @Kusalananda , I'm not sure, but i think that I need to create temp files for saving my actual scores and after my script is finished, i need delete it. – lolilaliaa Apr 02 '22 at 17:32
  • @Kusalananda , I have issues with algorithm , how to do it correctly. – lolilaliaa Apr 02 '22 at 17:33

4 Answers4

2

I will be assuming that the input will always contain exactly two fields per line.

You may use the GNU datamash utility to sort the data, group it by the first field, and calculate the sum of the second field for each group:

datamash -s -W --output-delimiter=: groupby 1 sum 2 <file

Here, the -s sorts the input, -W makes the utility treat any run of consecutive whitespace characters as a field delimiter, and --output-delimiter=: sets the output delimiter to the : character. The rest tells datamash to group by the first field and to calculate the sum of the second field for each group.

Given the input in the question in the file called file, this would produce the following output:

beta:5
score:9
something:3

You can solve this in any number of other ways too. The easiest computational solution would be to use awk:

awk '{ sum[$1] += $2 } END { for (key in sum) printf "%s:%d\n", key, sum[key] }' file 

Here, we use an associative array, sum, to hold the sum for each of the strings in the first field. The END block executes at the end of the input and outputs the calculated sums together with the strings.

Note that this solution also assumes that the first field is a single word containing no whitespace characters, as shown in the question.


Using a shell loop, reading the sorted lines from the original file, printing and resetting the sum of the second field whenever a new first field is encountered:

unset -v prev

sort file | { while read -r key value; do if [ "$key" != "${prev-$key}" ]; then # prev is set and different from $key

                    printf '%s:%d\n' &quot;$prev&quot; &quot;$sum&quot;
                    sum=0
            fi

            prev=$key
            sum=$(( sum + value ))
    done

    if [ &quot;${prev+set}&quot; = set ]; then
            printf '%s:%d\n' &quot;$prev&quot; &quot;$sum&quot;
    fi

}

Related: Why is using a shell loop to process text considered bad practice?

Kusalananda
  • 333,661
2

If you are dealing with large file, consider using sort and awk so that we don't allocate huge array for storing key and values in the RAM.

λ cat input.txt 
beta     1
score   9
something   2
beta     4
something   1
sort input.txt |
  awk -v OFS=: 'NR==1{ key=$1 }; NR>1&&$1!=key{ print key, sum; sum=0; key=$1 }; {sum+=$2} END{ print key, sum}'
beta:5
score:9
something:3
Weihang Jian
  • 1,227
0
#!/bin/bash
declare -i SECOND
while read first second; do
        if [ -z $FIRST ] || [ $first = $FIRST ]; then
                SECOND+=second
        else 
                echo $FIRST:$SECOND
                SECOND=second
        fi
        FIRST=$first
done < <(sort file)
echo $FIRST:$SECOND

Usually I write a similar blank and in production I put all the variables in quotes.

nezabudka
  • 2,428
  • 6
  • 15
-1
 for k in $(awk '{if(!seen[$1]++)print $1}' file.txt); do awk -v k="$k" 'BEGIN{sum=0}$0 ~ k {sum=sum+$2}END{print k,sum}' file.txt; done

output

beta 5
score 9
something 3
  • That's an O(n²) solution - for n records it will take around n reads of file.txt, comprising n*n lines of data. For comparison all the other solutions are O(n). Obviously for a homework exercise such as this with small values of n (lines in the file) there'll be little difference, but for a larger file such as a 1000 lines, your solution would read the entire file of 1000 lines up to 1001 times. – Chris Davies Apr 04 '22 at 08:13