Using sed to remove digit separation commas enclosed in quotes and quotes itself from CSV?

Question

How can I can use sed to remove from before-last column the group digit char comma and quotes itself?

Please note that in the sample below the target column is not contained in double quotes.

0,1,,,"10,815,197",
6,7,010202,,"5,589",
6,7,010202,,589,

An expetect result would be:

0,1,,,10815197,
6,7,010202,,5589,
6,7,010202,,589,

Could you post your expected output? – jherran Feb 08 '15 at 09:04 — jherran, Feb 08 '15 at 09:04
is your input always contain digits and commas? – Avinash Raj Feb 08 '15 at 12:20 — Avinash Raj, Feb 08 '15 at 12:20

score 2 · Answer 1 · answered Feb 08 '15 at 11:51

Awk will be the best for your scenario.

$ awk -F'"' '{gsub(",", "", $2);print}' file.txt 
0,1,,, 10815197 ,
6,7,010202,, 5589 ,
6,7,010202,,589,

How it works

-F'"' - causes AWK to use double quotes ( " ) as record separator.

gsub(",","",$2) - gsub function will search and replace all occurrence of double quotes with empty string.

print - It prints the modified content to the output.

jherran · Answer 2 · 2015-02-08T09:52:19.547

I think it's easier with awk. You can try something like this:

$ awk -v v='"' 'BEGIN{FS=OFS=v}{gsub(",","",$2);gsub("\"","",$0);print }' file.txt
0,1,,,10815197,
6,7,010202,,5589,
6,7,010202,,589,

Basically you are telling awk that use a regular expression -v v='"' to use it as field separator.
With FS=OFS=v you say that the field separator is the same as output field separator which is the ".
gsub (",","",$2) replace the , with nothing on the second field $2 (delimited in the start and the end with ").
gsub("\"","",$0) takes the whole line and replace " with nothing before the printout of the line.

Avinash Raj · Answer 3 · 2015-02-08T12:35:17.867

sed is not the right tool for this.

$ perl -pe 's|"([\d,]+)"(?=[^"]*$)|$1=~y/,//dr|eg' file
0,1,,,10815197,
6,7,010202,,5589,
6,7,010202,,589,

Through Python.

#!/usr/bin/python3
import sys
import re
file = sys.argv[1]
with open(file, 'r') as f:
    for line in f:
        print(re.sub(r'"([\d,]+)"(?=[^"]*$)', lambda m: m.group(1).replace(',', ''), line), end = "")

Save the above script to a file , say script.py and run then run the sript by firing the below command on the terminal.

$ python3 script.py inputfile

Using sed to remove digit separation commas enclosed in quotes and quotes itself from CSV?

3 Answers3