I have a csv file with a field having data enclosed in " " and commas in between. I want to replace it with another

Question

Below is the first 5 lines of my file. Here, I want to replace the "10,00,000.0" of 5th column with "10,000,000.0".

DE000A2200V7,09:30:00,8.5,8.509,"10,00,000.0","10,00,000.0","850,450.0"
DE000A2200V7,11:30:00,8.7,8.709,"20,00,000.0","20,000.0","870,450.0"
DE000A2200V7,13:30:00,8.763,8.883,"30,00,000.0","20,000.0","882,300.0"
DE000A2200V7,15:30:00,8.481,8.501,"10,00,000.0","10,00,000.0","849,100.0"
DE000A2200W5,09:30:00,15.826,15.835,"20,000.0","20,000.0","1,583,050.0"

Welcome to the site. Do I understand you correctly that you want to replace the erroneous n,00,000.0 with n,000,000.0 in all occurences in the 5th column? Do you also want to do that in the 6th column? — AdminBee, Mar 13 '20 at 11:32
Surely should it be multiplied by 10, instead of just changing the commas to mark thousands and millions? — , Mar 13 '20 at 11:36

Kusalananda · Answer 1 · 2020-03-14T19:25:05.470

By temporarily changing the CSV delimiter to a @ (or any other character that is not already part of the data) using csvformat from csvkit, then changing the relevant strings in the 5th field with awk, and returning the delimiter to the original comma:

csvformat -D '@' data.csv |
awk 'BEGIN { OFS=FS="@" } $5 == "10,00,000.0" { $5 = "10,000,000.0" }; 1' |
csvformat -d '@'

With your data in data.csv, this produces:

DE000A2200V7,09:30:00,8.5,8.509,"10,000,000.0","10,00,000.0","850,450.0"
DE000A2200V7,11:30:00,8.7,8.709,"20,00,000.0","20,000.0","870,450.0"
DE000A2200V7,13:30:00,8.763,8.883,"30,00,000.0","20,000.0","882,300.0"
DE000A2200V7,15:30:00,8.481,8.501,"10,000,000.0","10,00,000.0","849,100.0"
DE000A2200W5,09:30:00,15.826,15.835,"20,000.0","20,000.0","1,583,050.0"

Paulo Tomé · Answer 2 · 2020-03-13T17:48:06.840

With GNU awk

awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, '$5 == "\"10,00,000.0\"" \
{ $5="\"10,000,000.0\""}; {print}' file

Tests

$ cat file
DE000A2200V7,09:30:00,8.5,8.509,"10,00,000.0","10,00,000.0","850,450.0"
DE000A2200V7,11:30:00,8.7,8.709,"20,00,000.0","20,000.0","870,450.0"
DE000A2200V7,13:30:00,8.763,8.883,"30,00,000.0","20,000.0","882,300.0"
DE000A2200V7,15:30:00,8.481,8.501,"10,00,000.0","10,00,000.0","849,100.0"
DE000A2200W5,09:30:00,15.826,15.835,"20,000.0","20,000.0","1,583,050.0"

$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, '$5 == "\"10,00,000.0\"" { $5="\"10,000,000.0\""}; {print}' file
DE000A2200V7,09:30:00,8.5,8.509,10,000,000.0,"10,00,000.0","850,450.0"
DE000A2200V7,11:30:00,8.7,8.709,"20,00,000.0","20,000.0","870,450.0"
DE000A2200V7,13:30:00,8.763,8.883,"30,00,000.0","20,000.0","882,300.0"
DE000A2200V7,15:30:00,8.481,8.501,10,000,000.0,"10,00,000.0","849,100.0"
DE000A2200W5,09:30:00,15.826,15.835,"20,000.0","20,000.0","1,583,050.0"

Explanation

-vFPAT='([^,]*)|("[^"]+")'

separates the fields by commas, handling the cases where the fields may contain embedded commas (see GNU awk manual Defining Fields by Content).

-vOFS=,

states that the output file separator is the comma ,.

'$5 == "\"10,00,000.0\"" { $5="\"10,000,000.0\""}; {print}'

if fifth column matches the string "10,00,000.0" replace it by "10,000,000.0"; print the line.

score 0 · Answer 3 · answered Mar 13 '20 at 12:09

0

You could do this with following sed command:

sed -i 's/^\(\([^,]*,\)\{4\}\)\("[^"]*"\)\(.*\)$/\1"10,000,000.0"\4/' data.csv

answered Mar 13 '20 at 12:09

noAnton

361

score 0 · Answer 4 · answered Mar 13 '20 at 13:50

0

Sure that "20,00,000.0" is correct? If not, try

sed 's/,00,/,000,/' file

Add s's g flag if all wrong number should be cured...

answered Mar 13 '20 at 13:50

RudiC

8,969

score 0 · Answer 5 · answered Mar 13 '20 at 15:00

0

Just break and reconstruct the lines on " by setting FS=OFS="\"" and test the 2nd field

awk 'BEGIN{FS=OFS="\""} $2=="10,00,000.0"{$2="10,000,000.0"};1' file1

answered Mar 13 '20 at 15:00

bu5hman

4,756

score 0 · Answer 6 · 2020-03-14T20:56:38.830

Just in case the real question is:

How do the change to commas from grouping thousand, lakh and crore to grouping by thousands, millions, etc in the fields of a CSV file?

maybe this will do, extending upon the same trick from another answer:

LC_ALL=en_US.UTF-8 awk -v RS='"' -v ORS= '{
  if(NR % 2){print}
  else if(/[^0-9.,]/){print RS $0 RS}
  else { gsub(/,/,""); printf RS "%'\''.1f" RS, $0}
}' file

DE000A2200V7,09:30:00,8.5,8.509,"1,000,000.0","1,000,000.0","850,450.0"
DE000A2200V7,11:30:00,8.7,8.709,"2,000,000.0","20,000.0","870,450.0"
DE000A2200V7,13:30:00,8.763,8.883,"3,000,000.0","20,000.0","882,300.0"
DE000A2200V7,15:30:00,8.481,8.501,"1,000,000.0","1,000,000.0","849,100.0"
DE000A2200W5,09:30:00,15.826,15.835,"20,000.0","20,000.0","1,583,050.0"

This assumes an awk with locale support or which doesn't implement itself printf (gawk, bwk, Thomas Dickey's mawk). The default awk from Debian (an old version of mawk) will NOT do.

Only LC_NUMERIC is needed; LC_ALL is used as a cure against another LC_ALL that may be present in the environment ;-)

Oxyd · Answer 7 · 2020-03-13T12:22:57.770

-1

sed -i.backup 's/,"10,00,000.0"/,"10,000,000.0"/' data.csv

-i.backup write result to original file and backup original data to data.csv.backup
s/ search ,""10,00,000.0" and replace /,"10,000,000.0"/only first match in string
data.csv file to proceed

edited Mar 13 '20 at 12:22

answered Mar 13 '20 at 11:40

Oxyd

217

1

It would be easy to change the 6th column by mistake, for example if the second to last line had said 20,00,000.0 in the 5th column. – Kusalananda Mar 13 '20 at 11:46
Welcome to the site. Please, don't attach screenshots of console commands/console output; they can be difficult to read and (in particular when used in a question) make it difficult for contributors to reproduce the commands you typed. Instead, copy-and-paste it into your answer with appropriate formatting. – AdminBee Mar 13 '20 at 12:10

I have a csv file with a field having data enclosed in " " and commas in between. I want to replace it with another

7 Answers7