I have a CSV that I'm trying to remove the double quote values from using the following sed
one-liner. Details about this can be found in my original post to figure that out
sed -i.bak 's/^"//g;s/","/,/g;s/"$//g' $1
Although this now works in most cases I have an issue with one of my exports that has multiple lines and apparently is broken up with carriage returns. Here is a proof sample of the data
$ cat -v Quote.csv
"Id","Blob","Employee","Etc"^M
"0Q01N000001MxPbSAK","Job to happen late day/ evening", "Employee 1", "more stuff"^M
"0Q01N000001N4klSAC","Daytime work during normal businesses hours ^M
some details ^M
some more details ^M
conclusion","Employee 2", "more stuff"%
When I attempt this on the full file I get the following error:
CSV error: record 2 (line: 4, byte: 101): found record with 2 fields, but the previous record has 4 fields
I believe this is because the alignment of columns and rows is distorted even though it "appears" fine in excel.
Any ideas on how to properly parse this so I can get around this issue. I need the double quotes removed so that when I import the CSV into an ArangoDB the values are typed properly.
I found another forum with essentially the same issue and one proposed solution was this.
sed 's/$/~/' Quote.csv |tr '\n' ' ' |sed 's/~ "KEY-/\n"KEY-/g'
I believe if I could reverse engineer it to work with my ID
field, then perhaps it could work. I also noticed that I have <br>
characters and I'm not sure if they would need to be tr
'd out as well (seems like that would then mess up the data from having the line breaks it is expected to have)
sed
call. Thesed
call is there because when I get to the import I need arango to maintain the Type and not assume they are all strings – Xtremefaith Dec 07 '18 at 22:50