0

Instead of using the pythonic pandas way to produce such thing from two .csv files I am trying to see if I can do this with awk.

What I have is the following:

file1.csv              | file2.csv
name,value_a,value_b   | name,value_a,value_b
A,2,3                  | A,3,5
B,1,5                  | B,5,7
C,5,1                  | C,9,4
D,9,2                  | D,10,20

The column $1 is the same in both .csv files. But the columns $2 and $3 are different. What I wish to do is to produce a new file that has the same column $1 but for the columns $2 and $3, namely value_a and value_b it has the difference of those two.

The column value_a should have the difference produced when subtracting the second column of file1.csv from file2.csv where value_b should have the difference produced when subtracting the third column of file1.csv from file2.csv

So, the result should look like this.

diff.csv
name,value_a,value_b
A,1,2
B,4,2
C,4,3
D,1,18

Can this be done by awk ? Or should I stick with the python3 way ? Thanks in advance

My attempt:

awk -F, '(FNR==NR){hl[$1]=$2;lh[$1]=$3;next} 
{hl[$1]=$2-hl[$1];lh[$1]=$3-lh[$1];next} #update the arrays with the differences
END{ for (i in hl){print i,":",hl[i],lh[i]}}' file_1.csv file_2.csv  #print them

Is there a better way to do it?

ex1led
  • 286
  • 1
    It is totally possible. A good start would be searching how to handle multiple files with Awk. Some shortcuts for you: https://unix.stackexchange.com/q/106645 and https://unix.stackexchange.com/q/182758. – Quasímodo Mar 09 '21 at 13:33
  • Thanks @Quasímodo !! I have updated my question with what I've written. It seemed to work. Also it seemed to work if i did not include the next statement on my code. Is there a better way to do it? – ex1led Mar 09 '21 at 13:54
  • 1
    That is great, you have clearly made progress there. The second next is not necessary, but the first one is (see https://stackoverflow.com/q/32481877). If you set the OFS=,, your answer will be perfect. – Quasímodo Mar 09 '21 at 14:00

1 Answers1

4
paste -d, file1 file2 |
awk 'BEGIN{ FS=OFS="," }
     NR==1{ print $1, $2, $3; next }
          { print $1, $5-$2, $6-$3 }'

or with awk itself:

awk 'BEGIN   { FS=OFS="," }
     NR==FNR { val_a[$1]=$2; val_b[$1]=$3; next }
     FNR>1   { print $1, val_a[$1]-$2, val_b[$1]-$3; next }1' file2 file1
αғsнιη
  • 41,407