Scoring from input in Bash

Question

Can you help me please? I have a task. I have from input some text with numbers. For example:

beta     1
score   9
something   2
beta     4
something   1

I need to calculate all numbers with the same text around. And my output will be:(in this way with ":")

beta:5
something:3
score:9

Also it can be problem with temp files, where i can save my scores. And I need to use mktemp after script finished to delete it. Help me please, thanks.

Which part of this is it that you have issues with? Without knowing what you think is hard, it is difficult to give a helpful answer. I don't quite understand what you mean when you talk about temporary files. mktemp creates a file, it does not delete anything. Also, it seems that there is no need for temporary files to solve this exercise. — Kusalananda, Apr 02 '22 at 17:25
@Kusalananda , I'm not sure, but i think that I need to create temp files for saving my actual scores and after my script is finished, i need delete it. — lolilaliaa, Apr 02 '22 at 17:32
@Kusalananda , I have issues with algorithm , how to do it correctly. — lolilaliaa, Apr 02 '22 at 17:33

Kusalananda · Accepted Answer · 2022-04-02T19:39:41.040

2

I will be assuming that the input will always contain exactly two fields per line.

You may use the GNU datamash utility to sort the data, group it by the first field, and calculate the sum of the second field for each group:

datamash -s -W --output-delimiter=: groupby 1 sum 2 <file

Here, the -s sorts the input, -W makes the utility treat any run of consecutive whitespace characters as a field delimiter, and --output-delimiter=: sets the output delimiter to the : character. The rest tells datamash to group by the first field and to calculate the sum of the second field for each group.

Given the input in the question in the file called file, this would produce the following output:

beta:5
score:9
something:3

You can solve this in any number of other ways too. The easiest computational solution would be to use awk:

awk '{ sum[$1] += $2 } END { for (key in sum) printf "%s:%d\n", key, sum[key] }' file

Here, we use an associative array, sum, to hold the sum for each of the strings in the first field. The END block executes at the end of the input and outputs the calculated sums together with the strings.

Note that this solution also assumes that the first field is a single word containing no whitespace characters, as shown in the question.

Using a shell loop, reading the sorted lines from the original file, printing and resetting the sum of the second field whenever a new first field is encountered:

unset -v prev
sort file |
{
        while read -r key value; do
                if [ "$key" != "${prev-$key}" ]; then
                        # prev is set and different from $key
                    printf '%s:%d\n' &quot;$prev&quot; &quot;$sum&quot;
                    sum=0
            fi

            prev=$key
            sum=$(( sum + value ))
    done

    if [ &quot;${prev+set}&quot; = set ]; then
            printf '%s:%d\n' &quot;$prev&quot; &quot;$sum&quot;
    fi

}

Related: Why is using a shell loop to process text considered bad practice?

edited Apr 02 '22 at 19:39

answered Apr 02 '22 at 17:56

Kusalananda

333,661

There also way to solve it without awk by writing whole algorithm? – lolilaliaa Apr 02 '22 at 18:13
But thank u so much! – lolilaliaa Apr 02 '22 at 18:14
@lolilaliaa I'm afraid that I don't understand what "the whole algorithm" is that is not implemented by that awk program (or, for that matter, encapsulated by the datamash command). You may possibly have to update your question if you have further clarifications to it. – Kusalananda Apr 02 '22 at 18:16
I want to say, that is that exist solution in a whole Bash language with loops, if/else and etc. I hope you understand me – lolilaliaa Apr 02 '22 at 18:38
@lolilaliaa Using shell loops to parse data is fragile and not usually what you want to do. See for example Why is using a shell loop to process text considered bad practice? – Kusalananda Apr 02 '22 at 18:49
Ok, but this is my task do it with shell loops. But thanks :) – lolilaliaa Apr 02 '22 at 18:57
@lolilaliaa In that case, this should be mentioned in the question. Currently you don't say anything about that. – Kusalananda Apr 02 '22 at 19:05
@lolilaliaa See updated answer. – Kusalananda Apr 02 '22 at 19:20
Thank u so much! – lolilaliaa Apr 02 '22 at 20:10

score 2 · Answer 2 · answered Apr 02 '22 at 18:32

If you are dealing with large file, consider using sort and awk so that we don't allocate huge array for storing key and values in the RAM.

λ cat input.txt 
beta     1
score   9
something   2
beta     4
something   1

sort input.txt |
  awk -v OFS=: 'NR==1{ key=$1 }; NR>1&&$1!=key{ print key, sum; sum=0; key=$1 }; {sum+=$2} END{ print key, sum}'

beta:5
score:9
something:3

nezabudka · Answer 3 · 2022-04-04T08:51:18.333

0

#!/bin/bash
declare -i SECOND
while read first second; do
        if [ -z $FIRST ] || [ $first = $FIRST ]; then
                SECOND+=second
        else 
                echo $FIRST:$SECOND
                SECOND=second
        fi
        FIRST=$first
done < <(sort file)
echo $FIRST:$SECOND

Usually I write a similar blank and in production I put all the variables in quotes.

edited Apr 04 '22 at 08:51

answered Apr 04 '22 at 08:36

nezabudka

2,428
6
15

score -1 · Answer 4 · answered Apr 04 '22 at 06:41

-1

 for k in $(awk '{if(!seen[$1]++)print $1}' file.txt); do awk -v k="$k" 'BEGIN{sum=0}$0 ~ k {sum=sum+$2}END{print k,sum}' file.txt; done

output

beta 5
score 9
something 3

answered Apr 04 '22 at 06:41

Praveen Kumar BS

5,211

That's an O(n²) solution - for n records it will take around n reads of file.txt, comprising n*n lines of data. For comparison all the other solutions are O(n). Obviously for a homework exercise such as this with small values of n (lines in the file) there'll be little difference, but for a larger file such as a 1000 lines, your solution would read the entire file of 1000 lines up to 1001 times. – Chris Davies Apr 04 '22 at 08:13

Scoring from input in Bash

4 Answers4