How to merge lines broken by newlines inside a double quoted field?

Question

Imagine input is:

KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,"1GH8
",KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

I would like to have the 3 lines(with newline as #### for example):

KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,"1GH8####",KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

Thanks, Emanuel

Do you want the lines joined with 4 hash-marks, or do those just indicate where the lines were joined (by removing the newline)? — Jeff Schaller, Mar 09 '18 at 18:50
Yes i need some special sequence so that when storing in database I will then convert back #### into newline, thanks. — Emanuel Oliveira, Mar 09 '18 at 23:17

score 1 · Answer 1 · answered Mar 09 '18 at 18:53

1

awk solution:

awk -F',' '{ printf "%s%s", $0, ($NF ~ /^".+[^"]$/? "####" : ORS) }' file

The output:

KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,"1GH8####",KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

answered Mar 09 '18 at 18:53

RomanPerekhrest

30,212

Fantastic, thanks Roman.. I will test it tuesday when im back in office. – Emanuel Oliveira Mar 09 '18 at 23:19

score 0 · Answer 2 · answered Mar 10 '18 at 23:56

0

sed -r ':x /$/ { N; s/\r?\n\s*"/####"/; bx}' inputfile

\r? make it work for both Linux and Windows text files

answered Mar 10 '18 at 23:56

Kusalananda · Answer 3 · 2023-01-12T12:59:28.103

The data is appropriately quoted and should be readable by any CSV-aware parser.

To remove the newline that may occur in the 5th field of your header-less CSV file, you may use Miller (mlr) like so:

$ mlr --csv -N put '$5 = sub($5,"\n","")' file
KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,1GH8,KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

This rewrites the 5th field by using sub() to replace the first newline character with nothing (i.e. it removes it).

Replacing the newline with #### is also possible:

$ mlr --csv -N put '$5 = sub($5,"\n","####")' file
KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,1GH8####,KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

Note that the resulting field does not need quoting, which is why Miller does not add quotes by default. If you want to retain the original quotes, use --quote-original:

$ mlr --csv -N --quote-original put '$5 = sub($5,"\n","####")' file
KY,On,Ind ,Yes,1J5Z,KYEEI9,1/1/2016 Contract Code
KY,On,Ind ,Yes,"1GH8####",KYEEID,1/1/2016 Contract Code
KY,On,Ind ,Yes,1J5Y,KYEEIJ,1/1/2016 Contract Code

Use -I to perform an "in-place" edit.

How to merge lines broken by newlines inside a double quoted field?

3 Answers3

Linked