6

I have a a .csv file as follows:

"ID0054XX","PT. SUMUT","18 JL.BONJOL","SUMATERA UTARA, NORTH","MEDAN","","ID9856","PDSUIDSAXXX","","","","Y"
"ID00037687","PAN INDONESIA, PT.","JALAN JENDERAL, SUDIRMAN, SENAYAN","","INDIA","","ID566543","PINBIDJAXXX","","0601","","Y"

I have a script that assigns each of the comma separated value to a unique variable using , as the delimiter.

The portion of the script is as under :

IFS=,

[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }

while read Key  Name    Address1        Address2        City    State   Country SwiftCode       Nid     Chips   Aba     IsSwitching
do
          echo "-------------------------------------------------------------------"

     echo "From Key : $Key"

    echo "-------------------------------------------------------------------"
          echo "-------------------------------------------------------------------"

     echo "From Name : $Name"

What it does is separate the values having comma inside the quotes against my desired output of uniquely separating each value to their respective variables.

I tried replacing the comma like IFS=[","] but no luck. Any suggestion/ help is really appreciated.

Ashish K
  • 332
  • can you post how should look the final expected output? – RomanPerekhrest May 06 '17 at 10:00
  • Ron Manis (RDB: Unix Relational Database Managment) was adamant about the TAB. "There will never be a TAB character in a data field". So, the TSV format is the guaranteed safe version: "sed 's/^"//; s/"$//; s/","//g' file.csv | tee file.tsv where is typed [CTRL]-V{tab} – Marty McGowan May 06 '17 at 15:50

1 Answers1

5

You are doing a couple of things wrong here:

  1. You are using the shell to parse text.

    While this is possible, it is very inefficient. It is slow, hard to write, hard to read and very hard to do properly. The shell just isn't designed for this sort of thing.

  2. You are trying to parse a csv file without a csv parser.

    CSV is not a simple format. You can have fields that contain the delimiter as you do here. You can also have fields spanning multiple lines. Attempting to parse arbitrary CSV data with simple pattern matching is very, very complicated and extremely hard to get right.

The bad, hacky solution is to do something like this:

$ sed 's/","/"|"/g' file.csv | 
    while IFS='|' read -r Key Name Address1 Address2 City \
     State Country SwiftCode Nid Chips Aba IsSwitching; do 
        echo "From Key : $Key"; echo "From Name : $Name"; 
    done
From Key : "ID0054XX"
From Name : "PT. SUMUT"
From Key : "ID00037687"
From Name : "PAN INDONESIA, PT."

That will replace all "," with "|" and then use | as the delimiter. Of course, that will fail if any of your fields can contain |.

The good, clean approach is to use a proper scripting language, not the shell, and a csv parser. For example, in Perl1:

$ cat file.csv | perl -MText::CSV -le '
    $csv = Text::CSV->new({binary=>1}); 
    while ($row = $csv->getline(STDIN)){ my ($Key, $Name, $Address1, $Address2, $City, $State, $Country, $SwiftCode, $Nid, $Chips, $Aba, $IsSwitching) = @$row;
print "From Key: $Key\nFrom Name: $Name";}' 
From Key: ID0054XX
From Name: PT. SUMUT
From Key: ID00037687
From Name: PAN INDONESIA, PT.

Or, as a script:

#!/usr/bin/perl -l
use strict;
use warnings;
use Text::CSV;

open(my $fh, "file.csv"); my $csv = Text::CSV->new({binary=>1}); while (my $row = $csv->getline($fh)){ my ( $Key, $Name, $Address1, $Address2, $City, $State, $Country, $SwiftCode, $Nid, $Chips, $Aba, $IsSwitching ) = @$row; print "From Key: $Key\nFrom Name: $Name"; }

Note that you'll have to install the Text::CSV module first (cpanm Text::CSV) and you might want to install cpanm (package cpanminus on most distributions)

Alternatively, in Python 3:

#!/usr/bin/env python3

import csv with open('file.csv', newline='') as csvfile: linereader = csv.reader(csvfile, delimiter=',', quotechar='"') for row in linereader: print("From Key: %s\nFrom Name: %s" % (row[0], row[1]))

Saving the Python code above as a script and executing it on your file will print:

$ foo.py
From Key: ID0054XX
From Name: PT. SUMUT
From Key: ID00037687
From Name: PAN INDONESIA, PT.


1Yes, I am aware that that is a UUoC, but it's simpler to write as a one liner this way.

terdon
  • 242,166
  • I had to use sed with -i flag to make changes inline. @terdon thankyou so much for this. – Ashish K May 06 '17 at 10:52
  • @AshishK why make them inline? My (really bad suggestion which you shouldn't use) doesn't require it to be inline and making it inline might break the file for other applications. – terdon May 06 '17 at 11:05
  • The reason why I had to do it using sed is because i was not supposed to use perl and python here in the problem. I had already thought about this solution of replacing , outside quotes with some other character and then adding that character as the delimiter. However when using sed directly, it was opening the file and deleting everything in it. So had to use inline – Ashish K May 06 '17 at 11:49
  • Yes, because you must have tried to do sed 'sed command' file > file instead of sed 'sed command' file | while. – terdon May 06 '17 at 11:51
  • Yes! That was it.. I didn't notice that :) Thankyou. – Ashish K May 06 '17 at 12:14