2

Sample File (test.csv):

"PRCD-15234","CDOC","12","JUN-20-2016 17:00:00","title, with commas, ","Y!##!"
"PRCD-99999","CDOC","1","Sep-26-2016 17:00:00","title without comma","Y!##!"

Output file:

PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!

My script (doe not work) is below:

while IFS="," read f1 f2 f3 f4 f5 f6; 
do  
    echo $f1|$f2|$f3|$f4|$f5|$f6;  
done < test.csv
Kusalananda
  • 333,661
Shanthi
  • 23
  • 1
    You should probably use a tool that more correctly parses rather than just trying to tokenize. Are you open to solutions that use other languages like perl or python? – Eric Renouf Dec 07 '16 at 17:42
  • Thanks for fixing the format. Unfortunately no perl or python. It can only be a shell script using basic unix/linux commands – Shanthi Dec 07 '16 at 17:43
  • @user204362 perl isn't as universal as awk? – RonJohn Dec 07 '16 at 18:38

4 Answers4

6

(generate output) | sed -e 's/","/|/g' -e 's/^"//' -e 's/"$//'

or

sed -e 's/","/|/g' -e 's/^"//' -e 's/"$//' $file

For the 3 expressions:

  • -e 's/","/|/g' = replace all the delimiters "," with the new delimiter |

  • -e 's/^"//' = remove the leading " mark

  • -e 's/"$//' = remove the trailing end of line " mark

This will preserve any quote marks that happen to be in the title, as long as they don't match the initial delimiter pattern ","

Tim Kennedy
  • 19,697
1

How about cat test.csv | sed 's/\",\"/|/g' | sed 's/\"//g'

Assuming the data in the your file is like the way shown above, (I am not taking corner cases into consideration.) But above worked for me.

0

This one handles embedded string delimiters:

$ cat /tmp/bla
"PRCD-15234","CDOC","12","JUN-20-2016 17:00:00","title, with commas, ","Y!##!"
"PRCD-99999","CDOC","1","Sep-26-2016 17:00:00","title without comma","Y!##!"
"PRCD-99999","CDOC","1","Sep-26-2016 17:00:00","embedded\",delimiters\",","Y!##!"

sed -E 's/"(([^"]*(\\")?)*)",/\1|/g;s/"|(([^"]*(\\")?)*)"/\1/g'

PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|embedded\",delimiters\",|Y!##!
0

Your script does not work since it does not attempt to parse the quoted fields the way a CSV parser would. This means it sees the quoted fields' commas as delimiters.


Using the two CSV-aware tools csvformat (from csvkit) and Miller (mlr):

$ csvformat -D '|' file
PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!
$ mlr --csv --ofs pipe cat file
PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!
Kusalananda
  • 333,661