1

I'm looking for a command which can remove one/two double quotes from a CSV format file and comma's inside them to be substituted with space so that the whole field can look like one.

Ex: ""Wembley,London"" to be shown as Wembley London under one column in the csv.

Also, the command for "Wembley,London" as Wembley London. Irrespective of the field position the script would be reading the file.

I tried the below commands but its not useful.

sed 's/\"//g' $fname >  $Target/sample_UPS1.csv
sed 's/\,/ /g' $Target/sample_UPS1.csv >  $Target/sample_UPS1.csv

awk -F'""' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", " ", $i) } 1' $fname | sed 's/\"//g'   > $Target/sample_UPS.txt ##For removal of two double quotes and substitution of comma with a space##

awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", " ", $i) } 1' $Target/sample_UPS.txt | sed 's/\"//g' > $Target/sample_UPS1.txt ##For removal of double quotes and substitution of comma with a space##

Both the cases, sample_ups1 being my output file.

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
  • a,b,c,1234,23,"Wembley,London",267,agty being csv fields, I need the output as a,b,c,1234,23,Wembley London,267,agty. Similarly, if one of the fields has two double quotes, the output should be as above – Abhishek Aug 28 '18 at 13:18
  • 1
    can you edit the question and add the same you mentioned in comment. – Siva Aug 28 '18 at 13:20
  • Related https://unix.stackexchange.com/questions/48672/remove-comma-between-the-quotes-only-in-a-comma-delimited-file – Siva Aug 28 '18 at 13:25
  • Try, awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", " ", $i) } 1' file – Siva Aug 28 '18 at 13:27
  • @SivaPrasath: wouldn't work with the double double quotes. Remove those first: awk -F'"' -v OFS='' '{gsub (/""/, "\""); for (i=2; i<=NF; i+=2) gsub(",", " ", $i) } 1' file – RudiC Aug 28 '18 at 13:37
  • I already tried the above command. But then, what about two double quoted field? – Abhishek Aug 28 '18 at 13:37
  • @Abhishek i got the output as a,b,c,1234,23,Wembley London,267,agty for the code awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", " ", $i) } 1' file – Siva Aug 28 '18 at 13:54
  • Yes, Siva. But what about when we have two instances of double quotes? Like below a,b,c,1234,23,""Wembley,London"",267,agty – Abhishek Aug 29 '18 at 09:16

2 Answers2

0

If you have valid CSV, a proper CSV parser should be used. ruby's is easy to use: if

cat file.csv
"""Wembley,London""",a,"b","c, ""d e"", ""f,g"",h"

then

ruby -e '
    require "csv"
    data = CSV.read ARGV.shift
    data.each {|row| 
        row.each {|field| field.delete! "\""; field.tr! ",", " "}
        puts CSV.generate_line row
    }
' file.csv
Wembley London,a,b,c  d e  f g h

Note that embedded quotes in a CSV field require the quotes to be doubled. If you have

""Wembley,London"","other stuff",...

then you have malformed CSV.

glenn jackman
  • 85,964
0

Perl's Text::CSV module can handle malformed CSV of this type:

If there is really bad CSV data, like

1,"foo "bar" baz",42

or

1,""foo bar baz"",42

there is a way to get this data-line parsed and leave the quotes inside the quoted field as-is. This can be achieved by setting allow_loose_quotes AND making sure that the escape_char is not equal to quote_char

So, for example

echo 'a,b,c,1234,23,""Wembley,London"",267,agty' | 
  perl -MText::CSV -lne '
    BEGIN{$p = Text::CSV->new( {allow_loose_quotes => 1, escape_char => "\\"} )} 
    print join ",", map { s/,/ /g; s/"//g; $_ } $p->fields() if $p->parse($_)
'
a,b,c,1234,23,Wembley London,267,agty
steeldriver
  • 81,074