Regarding generating intersection and union of two csv files

Question

I have two csv files, there are some overlap columns between these two files. Assume one file is called as A.csv and another is called as B.csv. The intersection of A and B is called as C.

I would like to generate three new csv files: C; the subset of B after subtracting C from it, which is called as D; the union of A and D, which is called as E.

Are there any ways to do that in Linux/Unix using command without applying heavy weight programming languages?

Do some of the fields have quotes around them or embedded commas? If so, you need a “heavyweight” programming language. (Well, ok, you can get away with awk, but it's not pretty.) — Gilles 'SO- stop being evil', Sep 07 '12 at 22:11
No, they are just normal csv files. Every entry is either a word or a numerical value. — user785099, Sep 08 '12 at 22:02
I would have to see the file in question for a solution but do not underestimate the power of UNIX join and sort. — terdon, Nov 30 '12 at 12:52

jsj · Answer 1 · 2012-09-09T10:07:55.663

2

I would use python for this, don't be intimidated by python it's great at this kind of thing. My (rough and untested) solution for your problem would be:

f_csv_1 = open("csv1.csv")
f_csv_2 = open("csv2.csv")

csv_1 = f_csv_1.readlines()
csv_2 = f_csv_2.readlines()

f_csv_1.close()
f_csv_2.close()

intersection = list(set(csv_1) & set(csv_2))
union = list(set(csv_1) | set(csv_2))

out_1 = open("intersection.csv", "w")

for row in intersection:
      out_1.write("%s" % row)

out_2 = open("union.csv", "w")

for row in union:
      out_2.write("%s" % row)

out_1.close()
out_2.close()

edited Sep 09 '12 at 10:07

answered Sep 09 '12 at 06:24

jsj

1,410

Remember about closing files. – Paweł Rumian Sep 09 '12 at 07:10
@gorkypl fixed – jsj Sep 09 '12 at 10:08

Regarding generating intersection and union of two csv files

1 Answers1