Removal of double quotes and substitution of comma inside it

Question

I'm looking for a command which can remove one/two double quotes from a CSV format file and comma's inside them to be substituted with space so that the whole field can look like one.

Ex: ""Wembley,London"" to be shown as Wembley London under one column in the csv.

Also, the command for "Wembley,London" as Wembley London. Irrespective of the field position the script would be reading the file.

I tried the below commands but its not useful.

sed 's/\"//g' $fname >  $Target/sample_UPS1.csv
sed 's/\,/ /g' $Target/sample_UPS1.csv >  $Target/sample_UPS1.csv

awk -F'""' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", " ", $i) } 1' $fname | sed 's/\"//g'   > $Target/sample_UPS.txt ##For removal of two double quotes and substitution of comma with a space##

awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", " ", $i) } 1' $Target/sample_UPS.txt | sed 's/\"//g' > $Target/sample_UPS1.txt ##For removal of double quotes and substitution of comma with a space##

Both the cases, sample_ups1 being my output file.

a,b,c,1234,23,"Wembley,London",267,agty being csv fields, I need the output as a,b,c,1234,23,Wembley London,267,agty. Similarly, if one of the fields has two double quotes, the output should be as above — Abhishek, Aug 28 '18 at 13:18
can you edit the question and add the same you mentioned in comment. — Siva, Aug 28 '18 at 13:20
Related https://unix.stackexchange.com/questions/48672/remove-comma-between-the-quotes-only-in-a-comma-delimited-file — Siva, Aug 28 '18 at 13:25
Try, awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", " ", $i) } 1' file — Siva, Aug 28 '18 at 13:27
@SivaPrasath: wouldn't work with the double double quotes. Remove those first: awk -F'"' -v OFS='' '{gsub (/""/, "\""); for (i=2; i<=NF; i+=2) gsub(",", " ", $i) } 1' file — RudiC, Aug 28 '18 at 13:37
I already tried the above command. But then, what about two double quoted field? — Abhishek, Aug 28 '18 at 13:37
@Abhishek i got the output as a,b,c,1234,23,Wembley London,267,agty for the code awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", " ", $i) } 1' file — Siva, Aug 28 '18 at 13:54
Yes, Siva. But what about when we have two instances of double quotes? Like below a,b,c,1234,23,""Wembley,London"",267,agty — Abhishek, Aug 29 '18 at 09:16

score 0 · Answer 1 · answered Aug 28 '18 at 14:47

If you have valid CSV, a proper CSV parser should be used. ruby's is easy to use: if

cat file.csv

"""Wembley,London""",a,"b","c, ""d e"", ""f,g"",h"

then

ruby -e '
    require "csv"
    data = CSV.read ARGV.shift
    data.each {|row| 
        row.each {|field| field.delete! "\""; field.tr! ",", " "}
        puts CSV.generate_line row
    }
' file.csv

Wembley London,a,b,c  d e  f g h

Note that embedded quotes in a CSV field require the quotes to be doubled. If you have

""Wembley,London"","other stuff",...

then you have malformed CSV.

score 0 · Answer 2 · answered Aug 29 '18 at 00:35

Perl's Text::CSV module can handle malformed CSV of this type:

If there is really bad CSV data, like

1,"foo "bar" baz",42

or

1,""foo bar baz"",42

there is a way to get this data-line parsed and leave the quotes inside the quoted field as-is. This can be achieved by setting allow_loose_quotes AND making sure that the escape_char is not equal to quote_char

So, for example

echo 'a,b,c,1234,23,""Wembley,London"",267,agty' | 
  perl -MText::CSV -lne '
    BEGIN{$p = Text::CSV->new( {allow_loose_quotes => 1, escape_char => "\\"} )} 
    print join ",", map { s/,/ /g; s/"//g; $_ } $p->fields() if $p->parse($_)
'
a,b,c,1234,23,Wembley London,267,agty

Removal of double quotes and substitution of comma inside it

2 Answers2