How to break comma-separated fields into separate lines

Question

Input file:

A    B       C    D
1   2,3,4,5 6,7 8,9,10,11

Output file:

A       B     C     D
1       2     6     8
        3     7     9
        4           10
        5           11

If you convert spaces to newlines you may then be able to use an answer from something like Transposing rows and columns — Chris Davies, Feb 07 '22 at 16:55

score 2 · Answer 1 · answered Feb 07 '22 at 18:31

2

perl -MList::Util=max -lane '
  @F = map {[split /,/]} @F;
  $n = max map {scalar @$_} @F;
  foreach $i (0..$n-1) {print join "\t", map {$_->[$i]} @F};
' file

answered Feb 07 '22 at 18:31

glenn jackman

85,964

score 1 · Answer 2 · answered Feb 07 '22 at 18:33

Using GNU datamash, transposing the data twice. The first traspose is done with whitespace as input delimiter and with comma as output delimiter. This gives us the following data given your example in the question:

A,1
B,2,3,4,5
C,6,7
D,8,9,10,11

Then transposing this again with comma as both input and output delimiter while turning off strict mode (records do not have to have the same number of fields) and adding spaces as missing fields. This results in

A,B,C,D
1,2,6,8
 ,3,7,9
 ,4, ,10
 ,5, ,11

Then running the result through column to format as a nice looking table, similar to what you have in your question:

Full command pipeline:

datamash -W --output-delimiter=, transpose <file |
datamash -t , --filler=' ' --no-strict transpose |
column -s , -t

αғsнιη · Answer 3 · 2022-02-07T19:23:23.087

0

With awk +paste:

awk '{
       for(i=1; i<=NF; i++) {
           gsub(",", "\n", $i); close("col_"i); print $i >>"col_"i
       }
}' infile  && paste col_*

Notes:

you can delete the temporary files generated by the awk later rm -f col_*.\
you might also want to pass the paste command output to |column -s $'\t' -tn to justify the columns correctly, see Print two files in two columns for details.

edited Feb 07 '22 at 19:23

answered Feb 07 '22 at 19:16

αғsнιη

41,407

score 0 · Answer 4 · answered Feb 08 '22 at 08:04

Try also this pure awk approach:

awk '
NR == 1         {$1 = $1                                                        # recreate header line using OFS as the field separator
                 print
                }
NR > 1          {for (i=1; i<=NF; i++)                                          # handle each field in turn
                   for (j=split($i, T, ","); j; j--)    {A[i,j] = T[j]          # by separating field's elements into an array
                                                         if (j > MX) MX = j     # keep max element count
                                                        }
                 for (j=1; j<=MX; j++)                                          # for max element count, print all
                   for (i=1; i<=NF; i++) printf "%s%s", A[i,j], i==NF?ORS:OFS   # newly created lines
                 split("", A)                                                   # clear A array
            }

' OFS="\t" file
A       B       C       D
1       2       6       8
        3       7       9
        4               10
        5               11

jubilatious1 · Answer 5 · 2022-02-15T03:21:34.167

Using Raku (formerly known as Perl_6)

raku -e 'lines[0].put; my @a; @a.push( $_.split(",")) for lines.split("\t"); \
      my $i = @a>>.elems.max; my @b; for @a -> $a { for ^$i { \
      @b[$++].push($a[$_] // "␀".Str)}};   \
      $_.put for @b>>.join("\t");'   file

OR

raku -e 'lines[0].put; my @a; @a.push: $_.split(",") for lines.split("\t"); \
      my $i = @a>>.elems.max; my @b; for @a -> $a { \
      @b[$++].push($a[$_] // "␀".Str) for ^$i }; \
      put($_, "\t") for [Z] @b.rotor($i);'  file

Sample Input:

A   B   C   D
1   2,3,4,5 6,7 8,9,10,11,12

Sample Output (columns are \t separated):

A   B   C   D
1   2   6   8
␀   3   7   9
␀   4   ␀   10
␀   5   ␀   11
␀   ␀   ␀   12

Above is a solution coded in Raku, a member of the Perl-family of programming languages. The input text file is assumed to be a two-line file, each line separated by \t tabs between (potentially hundreds of) columns, with , commas separating elements present within each column.

Briefly, the first line is put as a header line. An array @a is populated with the subsequent line, first split on \t tabs (denoting columns), and then split on , commas. The longest element of @a array is then computed using @a>>.elems.max;. Then elements of @a are copied to the newly declared @b array, with the caveat that Raku's // "defined-OR" operator is used to insert ␀ at undefined positions. (The code also works with "".Str empty-string). Finally, the @b array data is put (printed).

[Columns in the output are properly aligned. Any misalignment is due to the width of the ␀ placeholder character].

Note, the header line can be handled first (and separately), for example, if it is whitespace-delimited:

my $header=lines[0].split("\s"); $header.join("\t").put;

OR (more simply)

lines[0].split("\s").join("\t").put;

https://raku.org

How to break comma-separated fields into separate lines

5 Answers5