Combine multiple .csv files with different lengths in a single one

Question

I have a similar problem as Merging contents of multiple .csv files into single .csv file but for different length of csv files.

What should I add to the perl script in this answer, in order to account for different length of csv files?

If you know how to do this in any other way, please feel free to suggest it.

Some sample input & output would go a long way towards getting a useful answer. — Jeff Schaller, Feb 07 '18 at 17:12
Possible duplicate of Merging contents of multiple .csv files into single .csv file — Wouter Verhelst, Feb 07 '18 at 17:49
@WouterVerhelst the accepted answer there says "WARNING: This script assumes that all input files have the same number of lines. Output will likely be unusable if any file has a different number of lines from any of the others.", which I think invalidates it as a duplicate here. The other answer there does not merge columns. — Jeff Schaller, Feb 07 '18 at 18:00

imbuedHope · Answer 1 · 2018-02-07T18:12:09.897

2

paste -d, 1.csv 2.csv | sed 's/^,//; s/,$//' > out.csv should do the trick

paste will merge by column in the order of files specified. To maintain csv format -d, is used. However paste treats an empty line as an entry and will insert commas. This is removed with the sed command.

This will also perform the same task as the linked question.

edited Feb 07 '18 at 18:12

answered Feb 07 '18 at 17:51

imbuedHope

431

Hi, thanks for the input. When I execute this with different length csv files, I get data entries that should not exist. For example, I Have the columns [Date, Value], for the first file I have daily data for 2017, and for the second file I have data for 2017 and 2018. What happens is that the corresponding columns of the first csv file (with data only from 2017) gets extra values (I don't know exactly from where) for 2018 dates, which only exist in the second csv – Anthony Feb 08 '18 at 11:36
The snippet I provided will merge columns (while pushing columns to the left if there are no entires) in the order of the files provided while keeping rows in tact. Your problem seems unclear without examples / seems different from your description above. – imbuedHope Feb 08 '18 at 14:54

score 0 · Answer 2 · answered Feb 07 '18 at 18:45

Here's a hack using awk to ensure all lines get the same number of fields.

I assume that the first csv file has the maximum number of lines: won't work if that's not the case.

$ cat 1.csv
a,b,c
d,e,f
g,h,i

$ cat 2.csv
foo,bar
baz,qux

$ paste -d, {1,2}.csv
a,b,c,foo,bar
d,e,f,baz,qux
g,h,i,

$ paste -d, {1,2}.csv | awk -F, -vOFS=, 'NR == 1 {n = NF} NF < n {NF = n} 1'
a,b,c,foo,bar
d,e,f,baz,qux
g,h,i,,

This may require GNU awk for setting the NF var.

score 0 · Answer 3 · answered Feb 07 '18 at 19:29

Couldn't resist creating a small Python script for this. Save to a file, e.g. mergecols.py, make it executable and the output of

./mergecols.py 2.csv 1.csv 1.csv 2.csv 2.csv 1.csv

will be

foo,bar,a,b,c,a,b,c,foo,bar,foo,bar,a,b,c
baz,qux,d,e,f,d,e,f,baz,qux,baz,qux,d,e,f
,,g,h,i,g,h,i,,,,,g,h,i

(with 1.csv and 2.csv of another answer). Here's the script:

#!/usr/bin/env python
import sys

fileNames = sys.argv[1:]
files = [ open(n, "rt") for n in fileNames ]
numberOfColumns = [ None ] * len(fileNames)
while True:
    newParts = [ ]
    foundEntries = False
    for idx in range(len(files)):
        line = files[idx].readline()
        if line:
            parts = line.strip().split(",")
            if numberOfColumns[idx] is None:
                numberOfColumns[idx] = len(parts)
            else:
                if numberOfColumns[idx] != len(parts):
                    raise Exception("Number of columns in '{}' changed".format(fileNames[idx]))
            newParts += parts
            foundEntries = True
        else:
            if numberOfColumns[idx] is None:
                raise Exception("{} does not appear to contain lines".format(fileNames[idx]))
            newParts += [ "" ] * numberOfColumns[idx]

    if not foundEntries:
        break
    sys.stdout.write(",".join(newParts) + "\n")

Hi, thanks for the input. Happens the same as in the previous comment. =/ — Anthony, Feb 08 '18 at 11:54
@Anthony Is it possible to make these csv files available somewhere? Would be easier to see what's going wrong, and to find a solution. — brm, Feb 08 '18 at 14:05

Combine multiple .csv files with different lengths in a single one

3 Answers3