how to convert multiple txt to CSV with field data separated by blank lines

Question

I have some data in multiple text files where fields are separated by blank lines. There are only 4 fields but in the second field there are more subfields, could be three or more. The first field is always a number, 0 or 1.

0
name_surname
1 yellow 1 brown 2 green
 short description

every file is made in the same way. The problem is I could have more or less colours in the third field.

Every txt should become a row in a csv file

0 [tab] name_surname [tab] 1 yellow ; 1 brown; 2 green [tab] "description"
1 [tab] name2_surname [tab] 1 brown; 1 blue [tab] "description"

After some reading I've found I should use awk in some way, but this is beyond what I can do.

Please post at least two records from your input. It's not clear from what you posted how the records are arranged. Do all records have exactly four lines? Is the indentation of the short description significant? — Gilles 'SO- stop being evil', Jul 25 '13 at 23:33

score 0 · Answer 1 · answered Jun 23 '14 at 06:16

Here is another sed solution:

sed '/./!d;/[^0-9]/{
        /^[0-9]/s/ [0-9] / ;&/g
        H;$!d
    };x;y/\n/<tab>/
' <<\DATA
0
name_surname
1 yellow 1 brown 2 green
 short description

1
name2_surname
2 paisley 4 yellow 1 brown 2 green
 short description
2
name3_surname
1 blue
 short description
DATA

Note that the <tab> in the y/\n/<tab>/ should be an actual <tab> character.

###OUTPUT###

0       name_surname    1 yellow ; 1 brown ; 2 green     short description
1       name2_surname   2 paisley ; 4 yellow ; 1 brown ; 2 green         short description
2       name3_surname   1 blue   short description

score 0 · Answer 2 · answered Jul 25 '13 at 10:18

I know how to do this using sed:

#!/bin/sed -nf
# Read second line & save first two lines to hold
N; h
# Read third line and perform transform
n; s/\([0-9]\+ [a-zA-Z]\+\) /\1; /g
# Append it to hold
H
# Give hold back
g
# Read fourth line
N
# Transform newlines
s/\n/\t/g
# Print result
p

Or in one line:

sed -n 'N;h;n;s/\([0-9]\+ [a-zA-Z]\+\) /\1; /g;H;g;N;s/\n/\t/g;p' data.txt

how to convert multiple txt to CSV with field data separated by blank lines

2 Answers2