Input file:
A B C D
1 2,3,4,5 6,7 8,9,10,11
Output file:
A B C D
1 2 6 8
3 7 9
4 10
5 11
Input file:
A B C D
1 2,3,4,5 6,7 8,9,10,11
Output file:
A B C D
1 2 6 8
3 7 9
4 10
5 11
perl -MList::Util=max -lane '
@F = map {[split /,/]} @F;
$n = max map {scalar @$_} @F;
foreach $i (0..$n-1) {print join "\t", map {$_->[$i]} @F};
' file
Using GNU datamash
, transposing the data twice. The first traspose is done with whitespace as input delimiter and with comma as output delimiter. This gives us the following data given your example in the question:
A,1
B,2,3,4,5
C,6,7
D,8,9,10,11
Then transposing this again with comma as both input and output delimiter while turning off strict mode (records do not have to have the same number of fields) and adding spaces as missing fields. This results in
A,B,C,D
1,2,6,8
,3,7,9
,4, ,10
,5, ,11
Then running the result through column
to format as a nice looking table, similar to what you have in your question:
A B C D
1 2 6 8
3 7 9
4 10
5 11
Full command pipeline:
datamash -W --output-delimiter=, transpose <file |
datamash -t , --filler=' ' --no-strict transpose |
column -s , -t
With awk
+paste
:
awk '{
for(i=1; i<=NF; i++) {
gsub(",", "\n", $i); close("col_"i); print $i >>"col_"i
}
}' infile && paste col_*
Notes:
rm -f col_*
.\paste
command output to |column -s $'\t' -tn
to justify the columns correctly, see Print two files in two columns for details.Try also this pure awk approach:
awk '
NR == 1 {$1 = $1 # recreate header line using OFS as the field separator
print
}
NR > 1 {for (i=1; i<=NF; i++) # handle each field in turn
for (j=split($i, T, ","); j; j--) {A[i,j] = T[j] # by separating field's elements into an array
if (j > MX) MX = j # keep max element count
}
for (j=1; j<=MX; j++) # for max element count, print all
for (i=1; i<=NF; i++) printf "%s%s", A[i,j], i==NF?ORS:OFS # newly created lines
split("", A) # clear A array
}
' OFS="\t" file
A B C D
1 2 6 8
3 7 9
4 10
5 11
Using Raku (formerly known as Perl_6)
raku -e 'lines[0].put; my @a; @a.push( $_.split(",")) for lines.split("\t"); \
my $i = @a>>.elems.max; my @b; for @a -> $a { for ^$i { \
@b[$++].push($a[$_] // "␀".Str)}}; \
$_.put for @b>>.join("\t");' file
OR
raku -e 'lines[0].put; my @a; @a.push: $_.split(",") for lines.split("\t"); \
my $i = @a>>.elems.max; my @b; for @a -> $a { \
@b[$++].push($a[$_] // "␀".Str) for ^$i }; \
put($_, "\t") for [Z] @b.rotor($i);' file
Sample Input:
A B C D
1 2,3,4,5 6,7 8,9,10,11,12
Sample Output (columns are \t
separated):
A B C D
1 2 6 8
␀ 3 7 9
␀ 4 ␀ 10
␀ 5 ␀ 11
␀ ␀ ␀ 12
Above is a solution coded in Raku, a member of the Perl-family of programming languages. The input text file is assumed to be a two-line file, each line separated by \t
tabs between (potentially hundreds of) columns, with ,
commas separating elements present within each column.
Briefly, the first line is put
as a header line. An array @a
is populated with the subsequent line, first split on \t
tabs (denoting columns), and then split on ,
commas. The longest element of @a
array is then computed using @a>>.elems.max;
. Then elements of @a
are copied to the newly declared @b
array, with the caveat that Raku's //
"defined-OR" operator is used to insert ␀
at undefined positions. (The code also works with "".Str
empty-string). Finally, the @b
array data is put
(printed).
[Columns in the output are properly aligned. Any misalignment is due to the width of the ␀
placeholder character].
Note, the header line can be handled first (and separately), for example, if it is whitespace-delimited:
my $header=lines[0].split("\s"); $header.join("\t").put;
OR (more simply)
lines[0].split("\s").join("\t").put;