How to create a function to sum the required multiple column

Question

I have a scenario where i want to sum the multiple column

The data in the file is :

ID|NAME|SAL|COST|PER|TAG

1|A|10|10|20|10|

1|B|10|15|20|10|

1|C|10|17|25|80|

1|D|115|110|20|100|

1|E|10|10|10|10|

i want sum of COLUMN - SAL | COST | PER | TAG

one i did with simple command but how to do by creating function

awk '{FS="|"}{s+=$3}END{print s}' file.txt

The function should be parameterised so that when i pass column name it should calculate sum for that column

The sum column's may differ. their may be requirement like only two column sum needed then it should take the two column names and process the sum for that

markgraf · Answer 1 · 2019-12-21T16:13:28.890

2

Taking inspiration from https://stackoverflow.com/a/32616101 :

$ col=SAL
$ colnum=$(awk -v RS='|' '/'$col'/{ print NR; exit}' testfile)
$ awk '{FS="|"}{s+='$colnum'}END{print s}' testfile 
18

The trick here is the quoting: The vars are outside the single-quotes.

That's easily wrapped into a script sumcols.sh:

#!/bin/bash
FILE="$1"
COLUMNS="${@:2}"
for col in $COLUMNS; do
  colnum=$(awk -v RS='|' '/'$col'/{ print NR; exit}' $FILE)
  awk '{FS="|"}{s+='$colnum'}END{print "'$col' ", s}' $FILE
done | column -t

Call it with the file to process as the first positional parameter and follow with the columns to process. E.g.:

$ ./sumcols.sh testfile SAL COST
SAL   18
COST  24

edited Dec 21 '19 at 16:13

answered Dec 21 '19 at 13:34

markgraf

2,860

... its good but how to pass multiple column names like col SAL COST PRE TAG – kotiyi3001 Dec 21 '19 at 14:31
Updated my answer with a short example script. – markgraf Dec 21 '19 at 16:07

steeldriver · Answer 2 · 2019-12-21T16:09:38.907

2

Using a sufficiently recent version of Miller

$ mlr --csvlite --allow-ragged-csv-input --fs '|' stats1 -a sum -f SAL file.txt
SAL_sum
155

(recent version only required because your input is ragged i.e. has a trailing empty column with no corresponding name in the header). You can easily sum multiple columns in Miller by passing their names as a comma separated list to the -f option:

... stats1 -a sum -f SAL,COST,PER,TAG ...

Similarly with GNU Datamash

$ datamash -Ht '|' sum SAL,COST,PER,TAG < file.txt
sum(SAL)|sum(COST)|sum(PER)|sum(TAG)
155|162|95|210

edited Dec 21 '19 at 16:09

answered Dec 21 '19 at 13:51

steeldriver

81,074

how to pass multiple column name like array list Array={SAL,COST,PRE,TAG} – kotiyi3001 Dec 21 '19 at 14:33

bu5hman · Answer 3 · 2019-12-21T16:08:32.557

Using awk (and losing the spaces in your input string)

myv='SAL|COST|PER|TAG'
awk -v ar="$myv" '
  BEGIN{FS="|"; getline; for (i=1;i<=NF;i++) {if ($i ~ ar) head[i]=0;title[i]=$i}}
  NF>1{for (h in head) head[h]+=$h}
  END{for (h in head) print title[h]":\t"head[h]}
' file

This assumes the regex matching is unique. If not then ...

myv='SAL|COST|PER|TAG'
awk -v ar="$myv" '
  BEGIN{FS="|"; getline; for (i=1;i<=NF;i++) head[$i]=i; split(ar,titles,"|")}
  NF>1{for (i=1; i<=NF; i++) val[i]+=$i}
  END{for (t in titles) print titles[t]":\t"val[head[titles[t]]]}
' file

Output

SAL:    155
COST:   162
PER:    95
TAG:    210

How to create a function to sum the required multiple column

3 Answers3

Linked

Related