0

I have this data in multiple rows, which I want to transpose into tab-separated multiple columns, i.e.,

ABC 0.98 0.58 5.87 0.01
DEF 0.88 5.85 6.89 0.25
GHI 8.99 5.66 4.78 6.22

into

ABC DEF GHI
0.98 0.88 8.99
0.58 5.85 5.66
5.87 6.89 4.78
0.01 0.25 6.22

Could you please help me with this so that I can get the output in the above format?

Kusalananda
  • 333,661

4 Answers4

0

Using GNU datamash:

$ datamash -W transpose <file
ABC     DEF     GHI
0.98    0.88    8.99
0.58    5.85    5.66
5.87    6.89    4.78
0.01    0.25    6.22

This uses datamash to transpose the whitespace-delimited rows into tab-delimited columns.

You may set another output delimiter using the --output-delimiter option. Here I've used a space, which seems to re-create your expected output:

$ datamash -W --output-delimiter=' ' transpose <file
ABC DEF GHI
0.98 0.88 8.99
0.58 5.85 5.66
5.87 6.89 4.78
0.01 0.25 6.22
Kusalananda
  • 333,661
0

Using any awk in any shell on every Unix box:

$ cat tst.awk
{
    for ( rowNr=1; rowNr<=NF; rowNr++ ) {
        vals[rowNr,NR] = $rowNr
    }
}
END {
    for ( rowNr=1; rowNr<=NF; rowNr++ ) {
        for ( colNr=1; colNr<=NR; colNr++ ) {
            printf "%s%s", vals[rowNr,colNr], (colNr<NR ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
ABC DEF GHI
0.98 0.88 8.99
0.58 5.85 5.66
5.87 6.89 4.78
0.01 0.25 6.22

You COULD do the above more briefly with:

$ cat tst.awk
{
    for ( rowNr=1; rowNr<=NF; rowNr++ ) {
        vals[rowNr] = (rowNr in vals ? vals[rowNr] OFS : "") $rowNr
    }
}
END {
    for ( rowNr=1; rowNr<=NF; rowNr++ ) {
        print vals[rowNr]
    }
}

$ awk -f tst.awk file
ABC DEF GHI
0.98 0.88 8.99
0.58 5.85 5.66
5.87 6.89 4.78
0.01 0.25 6.22

but note that in that case you're constantly re-assigning new values to the same variable, vals[rowNr], which is a relatively slow operation in awk compared to assigning to each new variable once as awk has to determine how much memory will be needed to store the new value, move the old value to that new memory location, append the new value, then free the old location and you end up needing a few large blocks of memory instead of many small ones so it gets harder for awk to find the available memory as the variable size increases. It's also mixing formatting of the output with reading of the input and so the resulting code is more tightly coupled than the first script. It's not terrible for this though and at least the code is concise.

Ed Morton
  • 31,617
0

Using Raku (formerly known as Perl_6)

raku -e '.put for [Z] lines.map(*.words);'

OR (two steps):

raku -e 'my @a = lines.map(*.words); .put for [Z] @a;' 

Sample Input:

ABC 0.98 0.58 5.87 0.01
DEF 0.88 5.85 6.89 0.25
GHI 8.99 5.66 4.78 6.22

Sample Output:

ABC DEF GHI
0.98 0.88 8.99
0.58 5.85 5.66
5.87 6.89 4.78
0.01 0.25 6.22

Briefly explaining the second example, lines are read into the @a array, and each is broken into whitespace-delineated words (i.e. columns). In the second statement, data is output, however the [Z] zip-like reduction operator is used to grab the first word of the first array element, and return it with the first word of the second array element, and the first word of the third array element, etc.

Alternatively, if your CSV/TSV demands are more stringent, you can use Raku's Text::CSV module at the command line:

~$ raku -MText::CSV -e '.put for [Z] csv(in => $*IN, sep => " ");'  < file
ABC DEF GHI
0.98 0.88 8.99
0.58 5.85 5.66
5.87 6.89 4.78
0.01 0.25 6.22

https://docs.raku.org/language/operators#index-entry-[]_(reduction_metaoperators)
https://unix.stackexchange.com/a/670344/227738
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17
0

Sounds like a job for BSD rs (included in BSDs since 1983 but not often found installed by default on other systems):

$ cat file
ABC 0.98 0.58 5.87 0.01
DEF 0.88 5.85 6.89 0.25
GHI 8.99 5.66 4.78 6.22
$ rs -T < file
ABC   DEF   GHI
0.98  0.88  8.99
0.58  5.85  5.66
5.87  6.89  4.78
0.01  0.25  6.22