Start by not doing this in the shell, but in something like awk
. The shell does not know how to do floating point arithmetics.
Your script seems to be getting a set of parameters that corresponds to the filename to parse and the names of the fields to consider.
You can do this a bit more efficiently than to call awk
twice for each column.
BEGIN { OFS = FS = "|" }
FNR == 1 {
# Handle the header row. Assuming "cols" is a FS-separated
# string of column names that we want to sum, pick out the
# column numbers that we want to process and put them in the
# array "col". This also converts the "cols" string into a
# corresponding "header" array.
nf = split(cols, header)
for (i = 1; i <= NF; ++i)
for (j = 1; j <= nf; ++j)
if ($i == header[j])
col[j] = i
next
}
{
# Sum each column that we have enumerated in the "col" array.
for (i = 1; i <= nf; ++i)
sum[i] += $(col[i])
}
END {
# Output a two row table containing only the headers that we
# have summed over and the accumulated sums.
$0 = ""
for (i = 1; i <= nf; ++i)
$i = header[i]
print
for (i = 1; i <= nf; ++i)
$i = sum[i]
print
}
Running this on your first file with:
$ awk -v cols="SAL|COST|PER|TAG" -f script.awk file1
SAL|COST|PER|TAG
155|162|95|210
Listing the same column twice gives the sum twice:
$ awk -v cols="SAL|SAL" -f script.awk file1
SAL|SAL
155|155
On the second file:
$ awk -v cols="SAL|COST|PER|TAG" -f script.awk file2
SAL|COST|PER|TAG
55.018|314.686|222.12|359.303
With post-processing using column -t
:
$ awk -v cols="SAL|COST|PER|TAG" -f script.awk file2 | column -s '|' -t
SAL COST PER TAG
55.018 314.686 222.12 359.303
Putting this into a simple script:
#!/bin/sh
infile=$1
shift
IFS='|'
awk -v cols="$*" -f script.awk "$infile" |
column -s '|' -t
You would use this as
$ ./script.sh file1 PER TAG
PER TAG
95 210
You could also use a CSV-aware tool, like the tools in CSVkit, to do the work for you.
csvstat
is able to compute some basic "stats" of a CSV file. Assuming your files are properly formatted CSV files (they are not, as each data row has a trailing |
-delimiter), the sum for each column can be had with
$ csvstat --sum file1
1. ID: None
2. NAME: None
3. SAL: 155
4. COST: 162
5. PER: 95
6. TAG: 210
$ csvstat --sum file2
1. ID: None
2. NAME: None
3. SAL: 55.018
4. COST: 314.686
5. PER: 222.12
6. TAG: 359.30278
Or, for just a single column:
$ csvstat --sum -c 'SAL' file2
55.018