-1

Below is the first 5 lines of my file. Here, I want to replace the "10,00,000.0" of 5th column with "10,000,000.0".

DE000A2200V7,09:30:00,8.5,8.509,"10,00,000.0","10,00,000.0","850,450.0"
DE000A2200V7,11:30:00,8.7,8.709,"20,00,000.0","20,000.0","870,450.0"
DE000A2200V7,13:30:00,8.763,8.883,"30,00,000.0","20,000.0","882,300.0"
DE000A2200V7,15:30:00,8.481,8.501,"10,00,000.0","10,00,000.0","849,100.0"
DE000A2200W5,09:30:00,15.826,15.835,"20,000.0","20,000.0","1,583,050.0"
Paulo Tomé
  • 3,782
  • Welcome to the site. Do I understand you correctly that you want to replace the erroneous n,00,000.0 with n,000,000.0 in all occurences in the 5th column? Do you also want to do that in the 6th column? – AdminBee Mar 13 '20 at 11:32
  • Surely should it be multiplied by 10, instead of just changing the commas to mark thousands and millions? –  Mar 13 '20 at 11:36

7 Answers7

1

By temporarily changing the CSV delimiter to a @ (or any other character that is not already part of the data) using csvformat from csvkit, then changing the relevant strings in the 5th field with awk, and returning the delimiter to the original comma:

csvformat -D '@' data.csv |
awk 'BEGIN { OFS=FS="@" } $5 == "10,00,000.0" { $5 = "10,000,000.0" }; 1' |
csvformat -d '@'

With your data in data.csv, this produces:

DE000A2200V7,09:30:00,8.5,8.509,"10,000,000.0","10,00,000.0","850,450.0"
DE000A2200V7,11:30:00,8.7,8.709,"20,00,000.0","20,000.0","870,450.0"
DE000A2200V7,13:30:00,8.763,8.883,"30,00,000.0","20,000.0","882,300.0"
DE000A2200V7,15:30:00,8.481,8.501,"10,000,000.0","10,00,000.0","849,100.0"
DE000A2200W5,09:30:00,15.826,15.835,"20,000.0","20,000.0","1,583,050.0"
Kusalananda
  • 333,661
0

With GNU awk

awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, '$5 == "\"10,00,000.0\"" \
{ $5="\"10,000,000.0\""}; {print}' file

Tests

$ cat file
DE000A2200V7,09:30:00,8.5,8.509,"10,00,000.0","10,00,000.0","850,450.0"
DE000A2200V7,11:30:00,8.7,8.709,"20,00,000.0","20,000.0","870,450.0"
DE000A2200V7,13:30:00,8.763,8.883,"30,00,000.0","20,000.0","882,300.0"
DE000A2200V7,15:30:00,8.481,8.501,"10,00,000.0","10,00,000.0","849,100.0"
DE000A2200W5,09:30:00,15.826,15.835,"20,000.0","20,000.0","1,583,050.0"

$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, '$5 == "\"10,00,000.0\"" { $5="\"10,000,000.0\""}; {print}' file
DE000A2200V7,09:30:00,8.5,8.509,10,000,000.0,"10,00,000.0","850,450.0"
DE000A2200V7,11:30:00,8.7,8.709,"20,00,000.0","20,000.0","870,450.0"
DE000A2200V7,13:30:00,8.763,8.883,"30,00,000.0","20,000.0","882,300.0"
DE000A2200V7,15:30:00,8.481,8.501,10,000,000.0,"10,00,000.0","849,100.0"
DE000A2200W5,09:30:00,15.826,15.835,"20,000.0","20,000.0","1,583,050.0"

Explanation

-vFPAT='([^,]*)|("[^"]+")' 

separates the fields by commas, handling the cases where the fields may contain embedded commas (see GNU awk manual Defining Fields by Content).

-vOFS=,

states that the output file separator is the comma ,.

'$5 == "\"10,00,000.0\"" { $5="\"10,000,000.0\""}; {print}'

if fifth column matches the string "10,00,000.0" replace it by "10,000,000.0"; print the line.

Paulo Tomé
  • 3,782
0

You could do this with following sed command:

sed -i 's/^\(\([^,]*,\)\{4\}\)\("[^"]*"\)\(.*\)$/\1"10,000,000.0"\4/' data.csv
noAnton
  • 361
0

Sure that "20,00,000.0" is correct? If not, try

sed 's/,00,/,000,/' file

Add s's g flag if all wrong number should be cured...

RudiC
  • 8,969
0

Just break and reconstruct the lines on " by setting FS=OFS="\"" and test the 2nd field

awk 'BEGIN{FS=OFS="\""} $2=="10,00,000.0"{$2="10,000,000.0"};1' file1
bu5hman
  • 4,756
0

Just in case the real question is:

How do the change to commas from grouping thousand, lakh and crore to grouping by thousands, millions, etc in the fields of a CSV file?

maybe this will do, extending upon the same trick from another answer:

LC_ALL=en_US.UTF-8 awk -v RS='"' -v ORS= '{
  if(NR % 2){print}
  else if(/[^0-9.,]/){print RS $0 RS}
  else { gsub(/,/,""); printf RS "%'\''.1f" RS, $0}
}' file

DE000A2200V7,09:30:00,8.5,8.509,"1,000,000.0","1,000,000.0","850,450.0"
DE000A2200V7,11:30:00,8.7,8.709,"2,000,000.0","20,000.0","870,450.0"
DE000A2200V7,13:30:00,8.763,8.883,"3,000,000.0","20,000.0","882,300.0"
DE000A2200V7,15:30:00,8.481,8.501,"1,000,000.0","1,000,000.0","849,100.0"
DE000A2200W5,09:30:00,15.826,15.835,"20,000.0","20,000.0","1,583,050.0"

This assumes an awk with locale support or which doesn't implement itself printf (gawk, bwk, Thomas Dickey's mawk). The default awk from Debian (an old version of mawk) will NOT do.

Only LC_NUMERIC is needed; LC_ALL is used as a cure against another LC_ALL that may be present in the environment ;-)

-1

sed -i.backup 's/,"10,00,000.0"/,"10,000,000.0"/' data.csv

  1. -i.backup write result to original file and backup original data to data.csv.backup
  2. s/ search ,""10,00,000.0" and replace /,"10,000,000.0"/only first match in string
  3. data.csv file to proceed
Oxyd
  • 217
  • 1
    It would be easy to change the 6th column by mistake, for example if the second to last line had said 20,00,000.0 in the 5th column. – Kusalananda Mar 13 '20 at 11:46
  • Welcome to the site. Please, don't attach screenshots of console commands/console output; they can be difficult to read and (in particular when used in a question) make it difficult for contributors to reproduce the commands you typed. Instead, copy-and-paste it into your answer with appropriate formatting. – AdminBee Mar 13 '20 at 12:10