1

I have a similar problem as Merging contents of multiple .csv files into single .csv file but for different length of csv files.

What should I add to the perl script in this answer, in order to account for different length of csv files?

If you know how to do this in any other way, please feel free to suggest it.

αғsнιη
  • 41,407
Anthony
  • 11

3 Answers3

2

paste -d, 1.csv 2.csv | sed 's/^,//; s/,$//' > out.csv should do the trick

paste will merge by column in the order of files specified. To maintain csv format -d, is used. However paste treats an empty line as an entry and will insert commas. This is removed with the sed command.

This will also perform the same task as the linked question.

  • Hi, thanks for the input. When I execute this with different length csv files, I get data entries that should not exist. For example, I Have the columns [Date, Value], for the first file I have daily data for 2017, and for the second file I have data for 2017 and 2018. What happens is that the corresponding columns of the first csv file (with data only from 2017) gets extra values (I don't know exactly from where) for 2018 dates, which only exist in the second csv – Anthony Feb 08 '18 at 11:36
  • The snippet I provided will merge columns (while pushing columns to the left if there are no entires) in the order of the files provided while keeping rows in tact. Your problem seems unclear without examples / seems different from your description above. – imbuedHope Feb 08 '18 at 14:54
0

Here's a hack using awk to ensure all lines get the same number of fields.

I assume that the first csv file has the maximum number of lines: won't work if that's not the case.

$ cat 1.csv
a,b,c
d,e,f
g,h,i

$ cat 2.csv
foo,bar
baz,qux

$ paste -d, {1,2}.csv
a,b,c,foo,bar
d,e,f,baz,qux
g,h,i,

$ paste -d, {1,2}.csv | awk -F, -vOFS=, 'NR == 1 {n = NF} NF < n {NF = n} 1'
a,b,c,foo,bar
d,e,f,baz,qux
g,h,i,,

This may require GNU awk for setting the NF var.

glenn jackman
  • 85,964
0

Couldn't resist creating a small Python script for this. Save to a file, e.g. mergecols.py, make it executable and the output of

./mergecols.py 2.csv 1.csv 1.csv 2.csv 2.csv 1.csv

will be

foo,bar,a,b,c,a,b,c,foo,bar,foo,bar,a,b,c
baz,qux,d,e,f,d,e,f,baz,qux,baz,qux,d,e,f
,,g,h,i,g,h,i,,,,,g,h,i

(with 1.csv and 2.csv of another answer). Here's the script:

#!/usr/bin/env python
import sys

fileNames = sys.argv[1:]
files = [ open(n, "rt") for n in fileNames ]
numberOfColumns = [ None ] * len(fileNames)
while True:
    newParts = [ ]
    foundEntries = False
    for idx in range(len(files)):
        line = files[idx].readline()
        if line:
            parts = line.strip().split(",")
            if numberOfColumns[idx] is None:
                numberOfColumns[idx] = len(parts)
            else:
                if numberOfColumns[idx] != len(parts):
                    raise Exception("Number of columns in '{}' changed".format(fileNames[idx]))
            newParts += parts
            foundEntries = True
        else:
            if numberOfColumns[idx] is None:
                raise Exception("{} does not appear to contain lines".format(fileNames[idx]))
            newParts += [ "" ] * numberOfColumns[idx]

    if not foundEntries:
        break
    sys.stdout.write(",".join(newParts) + "\n")
brm
  • 1,021
  • Hi, thanks for the input. Happens the same as in the previous comment. =/ – Anthony Feb 08 '18 at 11:54
  • @Anthony Is it possible to make these csv files available somewhere? Would be easier to see what's going wrong, and to find a solution. – brm Feb 08 '18 at 14:05