2

I am trying to aggregate a file containing the following data in UNIX. I need to add the amounts if the key is same.

Key,amount,date,Time
abc-xyz-12234,45,15-08-91,23:00
pqr-vgh-5241,15,15-08-91,21:00
abc-xyz-12234,35,15-08-91,23:00
pqr-vgh-5241,24,15-08-91,21:00
abc-xyz-12234,655,15-08-91,23:00
lkj-erf-8542,281,15-08-91,10:00
pqr-vgh-5241,40,15-08-91,21:00

Output should be as following

abc-xyz-12234,735,15-08-91,23:00
pqr-vgh-5241,79,15-08-91,21:00
lkj-erf-8542,281,15-08-91,10:00

I tried by the following command ,but it just gives me uniq

cat file | grep "abc-xyz-12234" | uniq
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
Kshitij
  • 21
  • 1

4 Answers4

3

Another possible solution with awk could be:

awk 'BEGIN { FS = OFS = "," }
     NR != 1 { y[$1] += $2; $2 = y[$1]; x[$1] = $0; }
     END { for (i in x) { print x[i]; } }' file

The flaw is it won't preserve your order. So result could be:

pqr-vgh-5241,79,15-08-91,21:00
abc-xyz-12234,735,15-08-91,23:00
lkj-erf-8542,281,15-08-91,10:00
cas
  • 78,579
taliezin
  • 9,275
1

You can do this with awk:

#!/bin/sh
sort | awk -F, '
function result() {
    if ( key != "" ) {
            printf "%s,%d,%s\n", key, value, datetime;
    }
}
BEGIN { key = ""; value = 0; datetime = ""; }
$2 ~ /^[0-9]+/ {
    if ( $1 == key ) {
            value += $2;
    } else {
            result();
            key = $1;
            value = $2;
            datetime = $3 "," $4;
    }
}
END { result(); }
'

giving

./foo <input
abc-xyz-12234,735,15-08-91,23:00
lkj-erf-8542,281,15-08-91,10:00
pqr-vgh-5241,79,15-08-91,21:00
Thomas Dickey
  • 76,765
0

Here's a way in Perl. Call it as such ./script file.ext:

use warnings;
use strict;

my %data;
my @order;

while (<>){
    next if $. == 1;
    my @line = split /,/;
    if (defined $data{$line[0]}){
        $data{$line[0]}->[1] += $line[1];
    }
    else {
        $data{$line[0]} = \@line;
        push @order, $line[0];
    }
}

for (@order){
    print join(',', @{$data{$_}});
}
stevieb
  • 886
0

This uses Miller (mlr) to compute the sum of the amount field over each set of records grouped by the Key, date, and Time fields. Miller will add the sum as a new field, amount_sum, at the end of the list of fields, so we also reorder the fields and relabel them.

mlr --csv \
    stats1 -a sum -f amount -g Key,date,Time then \
    reorder -f Key,amount_sum then \
    label Key,amount file

Output given the data in the question:

Key,amount,date,Time
abc-xyz-12234,735,15-08-91,23:00
pqr-vgh-5241,79,15-08-91,21:00
lkj-erf-8542,281,15-08-91,10:00

If you don't want the header in the output, use mlr with its --headerless-csv-output option.

Kusalananda
  • 333,661