4

I have two csv files, there are some overlap columns between these two files. Assume one file is called as A.csv and another is called as B.csv. The intersection of A and B is called as C.

I would like to generate three new csv files: C; the subset of B after subtracting C from it, which is called as D; the union of A and D, which is called as E.

Are there any ways to do that in Linux/Unix using command without applying heavy weight programming languages?

1 Answers1

2

I would use python for this, don't be intimidated by python it's great at this kind of thing. My (rough and untested) solution for your problem would be:

f_csv_1 = open("csv1.csv")
f_csv_2 = open("csv2.csv")

csv_1 = f_csv_1.readlines()
csv_2 = f_csv_2.readlines()

f_csv_1.close()
f_csv_2.close()

intersection = list(set(csv_1) & set(csv_2))
union = list(set(csv_1) | set(csv_2))

out_1 = open("intersection.csv", "w")

for row in intersection:
      out_1.write("%s" % row)

out_2 = open("union.csv", "w")

for row in union:
      out_2.write("%s" % row)

out_1.close()
out_2.close()
jsj
  • 1,410