0

I have a CSV file which I need to load to a MySQL table. I rely of identifying the columns end by the , character. That's why it is important that the , does not appear elsewhere other than as a column separator.

I found some rows which contains a column with , inside double quotations. for example a line like this one:

12,"name, brand - something, something",age,sex,,,,"name, brand - something, something, something",,,,,

Needs to be converted to:

12,name; brand - something; something,age,sex,,,,name; brand - something; something; something,,,,,

As you see, I replaced the , inside double quotations with ; so that when I load the file in MySQL the , inside double quotations are not considered as a separator as they are not , anymore. I also removed the double quotations " as they are not needed.

I tried to automate this for every row in my CSV file using sed as follows:

sed -e 's/"\*,\*"/"\*;\*"/g' -e 's/"//g' input.csv > output.csv

But the result did not replaced the , that come inside the double quotations with ;. It only removed the double quotations:

12,name, brand - something, something,age,sex,,,,name, brand - something, something, something,,,,,

3 Answers3

0

As already mentioned by @steeldriver, mysql probably knows know to handle that if used the right options, but FWIW you can do that with awk:

awk -v RS='"' -v ORS= 'NR % 2 || gsub(/,/,";") || 1'

12,name; brand - something; something,age,sex,,,,name; brand - something; something; something,,,,,

Or, while keeping the enclosing quotes:

awk -v RS='"' -v ORS= '{if(NR % 2) print; else{gsub(/,/,";");print RS $0 RS}}'

12,"name; brand - something; something",age,sex,,,,"name; brand - something; something; something",,,,,

This is using the same trick as here, only reverted: instead of modifying the part outside the quotes, I'm modifying the part inside the quotes.

0

csv files can be very tricky. You might end up having an escaped quote somewhere in the line and the regular expression to handle that would be unreadable and error prone.

I would suggest either a tool like cvskit or a small script in perl or python. This quickly crafted program in python should do it:

import csv

with open('input.csv',mode='r') as csv_file:
   csv_reader = csv.reader(csv_file)
   for row in csv_reader:
       print (',').join([f.replace(',',';') for f in row])
0

The best answer I found is using MySQL itself by adding thie line:

OPTIONALLY ENCLOSED BY '"'

So for example, the load query looks like this:

LOAD DATA INFILE 'filename.csv' INTO TABLE table_name 
  FIELDS TERMINATED BY ',' 
  OPTIONALLY ENCLOSED BY '"'
  IGNORE 1 LINES;