Instead of using the pythonic pandas way to produce such thing from two .csv files I am trying to see if I can do this with awk
.
What I have is the following:
file1.csv | file2.csv
name,value_a,value_b | name,value_a,value_b
A,2,3 | A,3,5
B,1,5 | B,5,7
C,5,1 | C,9,4
D,9,2 | D,10,20
The column $1
is the same in both .csv
files. But the columns $2
and $3
are different. What I wish to do is to produce a new file that has the same column $1
but for the columns $2
and $3
, namely value_a
and value_b
it has the difference of those two.
The column value_a
should have the difference produced when subtracting the second column of file1.csv from file2.csv where value_b
should have the difference produced when subtracting the third column of file1.csv from file2.csv
So, the result should look like this.
diff.csv
name,value_a,value_b
A,1,2
B,4,2
C,4,3
D,1,18
Can this be done by awk
? Or should I stick with the python3
way ?
Thanks in advance
My attempt:
awk -F, '(FNR==NR){hl[$1]=$2;lh[$1]=$3;next}
{hl[$1]=$2-hl[$1];lh[$1]=$3-lh[$1];next} #update the arrays with the differences
END{ for (i in hl){print i,":",hl[i],lh[i]}}' file_1.csv file_2.csv #print them
Is there a better way to do it?
next
statement on my code. Is there a better way to do it? – ex1led Mar 09 '21 at 13:54next
is not necessary, but the first one is (see https://stackoverflow.com/q/32481877). If you set theOFS=,
, your answer will be perfect. – Quasímodo Mar 09 '21 at 14:00