compare two column of two different CSV file & merge the output

Question

TABLE1.csv

 DATE,      TIMESTAMP, ID,  START TIME, END TIME, DURATION
2019-04-05, 13:57:19,  1607,13:06:42,   13:07:12, 00:00:30
2019-04-05, 13:58:00,  2327,13:57:26,   13:57:43, 00:00:17
2019-04-24, 12:30:00,  1836,11:20:01,   12:30:00, 01:09:59
2019-04-24, 12:30:00,  1836,11:20:01,   12:30:00, 01:09:59
2019-04-24, 15:30:01,  1836,14:50:01,   15:30:01, 00:40:00
2019-04-24, 15:30:01,  1836,14:50:01,   15:30:01, 00:40:00

TABLE2.csv

KEY, ID,   NAME
407, 1607, RECORD1
1127,2327, RECORD2
636, 1836, RECORD3
664, 1864, RECORD4
703, 1903, RECORD5

Match Coloumn 3 from TABLE1.csv and Coloumn 2 from TABLE2.csv & output required as given below:

 DATE,      TIMESTAMP, ID,   NAME,    START TIME, END TIME, DURATION
2019-04-05, 13:57:19,  1607, RECORD1, 13:06:42,   13:07:12, 00:00:30
2019-04-05, 13:58:00,  2327, RECORD2, 13:57:26,   13:57:43, 00:00:17
2019-04-24, 12:30:00,  1836, RECORD3, 11:20:01,   12:30:00, 01:09:59
2019-04-24, 12:30:00,  1836, RECORD3, 11:20:01,   12:30:00, 01:09:59
2019-04-24, 15:30:01,  1836, RECORD3, 14:50:01,   15:30:01, 00:40:00
2019-04-24, 15:30:01,  1836, RECORD3, 14:50:01,   15:30:01, 00:40:00

I have tried every example of

awk -F',' 'FNR==NR.....

Philippos · Accepted Answer · 2019-05-08T11:09:33.110

This sed one collects the replacements of TABLE2.csv in hold space and then performs the adding of the column:

sed 's/^[^,]*, *\([0-9]*,[^,]*\)$/\1/
T2
H;d
:2
G
s/\([^,]*,[^,]*, *\)\([0-9]*,\)\([^[:cntrl:]]*\)\n.*\n\2\([^[:cntrl:]]*\)/\1\2\4, \3/
P;d' TABLE2.csv TABLE1.csv

You have to add something for the changed headline, if neccessary.

s/^[^,]*, *$[0-9]*,[^,]*$$/\1/ does extract the second and third column from TABLE2.csv lines
T2 jumps to :2, if no replacement was made (thus, for TABLE2.csv lines)
Changed TABLE2.csv lines are appended to the Hold space and then deleted (stop execution for this line)
:2 is the jump mark. Everything that follows is done for TABLE1.csv lines only
G appends the lookup table from the hold space to the pattern space
s/$[^,]*,[^,]*, *$$[0-9]*,$$[^[:cntrl:]]*$\n.*\n\2$[^[:cntrl:]]*$/\1\2\4, \3/ looks for occurences of the ID ([0-9]*,) in the lookup table and adds the matching field
P prints the line (without the added lookup table) and d stops execution for the line.

For more detailed explanation refer to How to perform replacements defined in one file on another file

score 0 · Answer 2 · answered May 08 '19 at 14:19

I'd use awk for this

awk -F', *' -v OFS=', ' '
    NR == FNR { name[$2] = $3; next }
    { $3 = $3 OFS name[$3]; print }
' TABLE{2,1}.csv

outputs

 DATE, TIMESTAMP, ID, NAME, START TIME, END TIME, DURATION
2019-04-05, 13:57:19, 1607, RECORD1, 13:06:42, 13:07:12, 00:00:30
2019-04-05, 13:58:00, 2327, RECORD2, 13:57:26, 13:57:43, 00:00:17
2019-04-24, 12:30:00, 1836, RECORD3, 11:20:01, 12:30:00, 01:09:59
2019-04-24, 12:30:00, 1836, RECORD3, 11:20:01, 12:30:00, 01:09:59
2019-04-24, 15:30:01, 1836, RECORD3, 14:50:01, 15:30:01, 00:40:00
2019-04-24, 15:30:01, 1836, RECORD3, 14:50:01, 15:30:01, 00:40:00

That reads TABLE2 first and maps the ID to the NAME.
Then reads TABLE1 and adds the NAME to the ID field.

Formatting the spacing of your columns is left up to you.

OUTPUT is: , 13:06:42, 13:07:12, 00:00:30RECORD1 , 13:57:26, 13:57:43, 00:00:17RECORD2 , 11:20:01, 12:30:00, 01:09:59RECORD3 , 11:20:01, 12:30:00, 01:09:59RECORD3 , 14:50:01, 15:30:01, 00:40:00lRECORD3 — Vivek Nigam, May 09 '19 at 07:53

compare two column of two different CSV file & merge the output

2 Answers2