0

Imagine input is:

KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,"1GH8
",KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

I would like to have the 3 lines(with newline as #### for example):

KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,"1GH8####",KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

Thanks, Emanuel

3 Answers3

1

awk solution:

awk -F',' '{ printf "%s%s", $0, ($NF ~ /^".+[^"]$/? "####" : ORS) }' file

The output:

KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,"1GH8####",KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code
0
sed -r ':x /$/ { N; s/\r?\n\s*"/####"/; bx}' inputfile

\r? make it work for both Linux and Windows text files

0

The data is appropriately quoted and should be readable by any CSV-aware parser.

To remove the newline that may occur in the 5th field of your header-less CSV file, you may use Miller (mlr) like so:

$ mlr --csv -N put '$5 = sub($5,"\n","")' file
KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,1GH8,KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

This rewrites the 5th field by using sub() to replace the first newline character with nothing (i.e. it removes it).

Replacing the newline with #### is also possible:

$ mlr --csv -N put '$5 = sub($5,"\n","####")' file
KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,1GH8####,KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

Note that the resulting field does not need quoting, which is why Miller does not add quotes by default. If you want to retain the original quotes, use --quote-original:

$ mlr --csv -N --quote-original put '$5 = sub($5,"\n","####")' file
KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,"1GH8####",KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

Use -I to perform an "in-place" edit.

Kusalananda
  • 333,661