Replace a character only between the double-quotes with sed or awk

Question

I have such a scenario inside a CSV file:

ID,PRICE,QUANTITY,ARRIVAL
01299,"41,5",1,0
26528,"412,03",0,0
38080,"2,35",0,0
38081,"2,35",0,0
..
..

The question I ask myself is: How do I replace , with the ., but only in the prices, enclosed within the double quotes "...", in the PRICE column?

I tried with

sed -i 's/\(,\)[^ ]*\( .*\)/\1"."\2/'

but without success, can you give me a tip?

score 4 · Accepted Answer · answered Nov 04 '21 at 13:10

I would recommend using the "-E" option for sed, for extended regular expression. If you only have numbers between the quotes you could do something like this:

~ $ cat foo
lorem, ipsum
"5,25", foobar
"foo,bar", foobar
~ $ cat foo | sed -E 's/"([0-9]+)\,([0-9]+)"/"\1.\2"/g'
lorem, ipsum
"5.25", foobar
"foo,bar", foobar

If you have upper- and lowercase letters in the quotes you can extend the regexp like this:

~ $ cat foo | sed -E 's/"([0-9]+|[a-z]+|[A-Z]+)\,([0-9]+|[a-z]+|[A-Z]+)"/"\1.\2"/g'
lorem, ipsum
"5.25", foobar
"foo.bar", foobar

You can easily test your regular expressions at regex101.com: See also: https://regex101.com/r/mlfSMp/1

Kusalananda · Answer 2 · 2021-11-05T11:37:07.173

Using csvformat from csvkit:

$ csvformat -D ';' file | tr , . | csvformat -d ';'
ID,PRICE,QUANTITY,ARRIVAL
01299,41.5,1,0
26528,412.03,0,0
38080,2.35,0,0
38081,2.35,0,0

The above command changes the CSV delimiter from the default comma to semi-colon. It then changes all remaining commas into dots before changing the field delimiter back to comma.

Alternatively, convert the CSV to JSON using csvjson and modify the data with jq:

$ csvjson -I file | jq -r '(.[0] | keys_unsorted | @csv), (map(.PRICE |= sub(",";".") | [.[]] | @csv)[])'
"ID","PRICE","QUANTITY","ARRIVAL"
"01299","41.5","1","0"
"26528","412.03","0","0"
"38080","2.35","0","0"
"38081","2.35","0","0"

Pass this through csvformat (with no arguments) if you want to remove any unneeded quotes.

The jq expression first outputs the header by taking the keys from the first object in the input array. It then modifies the PRICE field of each element (changes the comma into a dot) and formats the values as CSV for output.

The difference in effect between this second variation and the first is that we only ever affect the PRICE column. Any embedded commas in any other field will be untouched.

For reference, the JSON document that jq gets as input from csvjson, given the CSV data in the question, would be

[
  {"ID":"01299","PRICE":"41,5","QUANTITY":"1","ARRIVAL":"0"},
  {"ID":"26528","PRICE":"412,03","QUANTITY":"0","ARRIVAL":"0"},
  {"ID":"38080","PRICE":"2,35","QUANTITY":"0","ARRIVAL":"0"},
  {"ID":"38081","PRICE":"2,35","QUANTITY":"0","ARRIVAL":"0"}
]

score 0 · Answer 3 · answered Nov 05 '21 at 10:53

Tweaking this answer to Remove comma between the quotes only in a comma delimited file for an awk solution:

awk -F'"' -v OFS='"' '{ for (i=2; i<=NF; i+=2) gsub(",", ".", $i) } 1' infile

where infile is

ID,PRICE,QUANTITY,ARRIVAL
01299,"41,5",1,0
26528,"412,03",0,0
38080,"2,35",0,0
38081,"2,35",0,0

and the result is

ID,PRICE,QUANTITY,ARRIVAL
01299,"41.5",1,0
26528,"412.03",0,0
38080,"2.35",0,0
38081,"2.35",0,0

The explanation of the code is the same as the referenced answer so I won't repeat it, except that:

The output field separator is now " (i.e. OFS='"'), instead of nothing (i.e. OFS=''), and;
The gsub now swaps the comma (",") for a period ("."), instead of nothing ("")

Replace a character only between the double-quotes with sed or awk

3 Answers3