0

I'm using the command line to combine hundreds of CSV files (with identical columns) in the manner described in these questions.

The problem is that some of my CSVs have an NA value in cell A1. In these cases, the values from the first row of the file are added to the last row of the previous file.

Here is a simple example.

csv1

col1,col2

csv2

,16
17,18

cat *.csv >merged.csv yields this output

col1,col2,16
17,18

my desired output

col1,col2
,16
17,18

One option is to modify the source files so that the first column never contains missing values. Due to the quantity of data involved, I'd like to avoid that if possible. Is it possible to fix this behavior within the cat command?

John J.
  • 101
  • Welcome to Unix & Linux! Please don't post images of text. Instead, copy/paste the text into your question and use the formatting tools to format it as code. That way, we can actually copy it and test any solutions we give you. We can't help you parse data that you don't show. – terdon Jan 27 '21 at 18:17
  • please do not use images, use proper text file data instead, 2) without seeing how a CSV looks like in your case, one cannot tell what is happening, 3) cat just concatenates files, the behaviour you desire seems out of its scope, you would need other tools. 4) Fillling in a placeholder is usually quite easy with other tools, (almost) no matter the number of files.
  • – FelixJN Jan 27 '21 at 18:18
  • 1
    Could it be that your problematic files are missing a newline? – Freddy Jan 27 '21 at 18:23
  • Thanks, @Freddy. That was my problem, it seems. The question you linked to is very helpful. – John J. Jan 27 '21 at 19:11