Addition of values in second columns if the first column entry is same in UNIX

Question

I am trying to aggregate a file containing the following data in UNIX. I need to add the amounts if the key is same.

Key,amount,date,Time
abc-xyz-12234,45,15-08-91,23:00
pqr-vgh-5241,15,15-08-91,21:00
abc-xyz-12234,35,15-08-91,23:00
pqr-vgh-5241,24,15-08-91,21:00
abc-xyz-12234,655,15-08-91,23:00
lkj-erf-8542,281,15-08-91,10:00
pqr-vgh-5241,40,15-08-91,21:00

Output should be as following

abc-xyz-12234,735,15-08-91,23:00
pqr-vgh-5241,79,15-08-91,21:00
lkj-erf-8542,281,15-08-91,10:00

I tried by the following command ,but it just gives me uniq

cat file | grep "abc-xyz-12234" | uniq

score 3 · Answer 1 · edited Nov 06 '15 at 10:38

3

Another possible solution with awk could be:

awk 'BEGIN { FS = OFS = "," }
     NR != 1 { y[$1] += $2; $2 = y[$1]; x[$1] = $0; }
     END { for (i in x) { print x[i]; } }' file

The flaw is it won't preserve your order. So result could be:

pqr-vgh-5241,79,15-08-91,21:00
abc-xyz-12234,735,15-08-91,23:00
lkj-erf-8542,281,15-08-91,10:00

edited Nov 06 '15 at 10:38

cas

78,579

answered Nov 06 '15 at 09:48

taliezin

9,275

Order doesnt matters – Kshitij Nov 06 '15 at 10:19

score 1 · Answer 2 · answered Nov 06 '15 at 09:46

You can do this with awk:

#!/bin/sh
sort | awk -F, '
function result() {
    if ( key != "" ) {
            printf "%s,%d,%s\n", key, value, datetime;
    }
}
BEGIN { key = ""; value = 0; datetime = ""; }
$2 ~ /^[0-9]+/ {
    if ( $1 == key ) {
            value += $2;
    } else {
            result();
            key = $1;
            value = $2;
            datetime = $3 "," $4;
    }
}
END { result(); }
'

giving

./foo <input
abc-xyz-12234,735,15-08-91,23:00
lkj-erf-8542,281,15-08-91,10:00
pqr-vgh-5241,79,15-08-91,21:00

score 0 · Answer 3 · answered Nov 06 '15 at 15:14

Here's a way in Perl. Call it as such ./script file.ext:

use warnings;
use strict;

my %data;
my @order;

while (<>){
    next if $. == 1;
    my @line = split /,/;
    if (defined $data{$line[0]}){
        $data{$line[0]}->[1] += $line[1];
    }
    else {
        $data{$line[0]} = \@line;
        push @order, $line[0];
    }
}

for (@order){
    print join(',', @{$data{$_}});
}

score 0 · Answer 4 · answered Jan 12 '23 at 13:33

This uses Miller (mlr) to compute the sum of the amount field over each set of records grouped by the Key, date, and Time fields. Miller will add the sum as a new field, amount_sum, at the end of the list of fields, so we also reorder the fields and relabel them.

mlr --csv \
    stats1 -a sum -f amount -g Key,date,Time then \
    reorder -f Key,amount_sum then \
    label Key,amount file

Output given the data in the question:

Key,amount,date,Time
abc-xyz-12234,735,15-08-91,23:00
pqr-vgh-5241,79,15-08-91,21:00
lkj-erf-8542,281,15-08-91,10:00

If you don't want the header in the output, use mlr with its --headerless-csv-output option.

Addition of values in second columns if the first column entry is same in UNIX

4 Answers4