Convert rows to columns

Question

I have a file that includes details about VMs running in a hypervisor. We run some command and redirect the output to a file. And the is data available in the below format.

Virtual Machine : OL6U5
        ID     : 0004fb00000600003da8ce6948c441bb
        Status : Running
        Memory : 65536
        Uptime : 17835 Minutes
        Server : MyOVS1.vmorld.com
        Pool   : HA-POOL
        HA Mode: false
        VCPU   : 16
        Type   : Xen PVM
        OS     : Oracle Linux 6
Virtual Machine : OL6U6
        ID     : 0004fb00000600003da8ce6948c441bc
        Status : Running
        Memory : 65536
        Uptime : 17565 Minutes
        Server : MyOVS2.vmorld.com
        Pool   : NON-HA-POOL
        HA Mode: false
        VCPU   : 16
        Type   : Xen PVM
        OS     : Oracle Linux 6
Virtual Machine : OL6U7
        ID     : 0004fb00000600003da8ce6948c441bd
        Status : Running
        Memory : 65536
        Uptime : 17835 Minutes
        Server : MyOVS1.vmorld.com
        Pool   : HA-POOL
        HA Mode: false
        VCPU   : 16
        Type   : Xen PVM
        OS     : Oracle Linux 6

This output differs from hypervisor to hypervisor since on some hypervisors we have 50 + vms running. Above file is a just an example from hypervisor where we have only 3 VMs running and hence the redirected file is expected to contain information about several( N number of VMs)

We need to get this details in the below format using awk/sed or with a shell script

Virtual_Machine  ID                                Status   Memory  Uptime  Server              Pool        HA     VCPU  Type     OS
OL6U5            0004fb00000600003da8ce6948c441bb  Running  65536   17835   MyOVS1.vmworld.com  HA-POOL     false  16    Xen PVM  Oracle Linux 6
OL6U6            0004fb00000600003da8ce6948c441bc  Running  65536   17565   MyOVS2.vmworld.com  NON-HA-POOL     false  16    Xen PVM  Oracle Linux 6
OL6U5            0004fb00000600003da8ce6948c441bd  Running  65536   17835   MyOVS1.vmworld.com  HA-POOL     false  16    Xen PVM  Oracle Linux 6

Possible duplicate of Rows to column conversion of file – αғsнιη Jul 20 '17 at 18:53 — αғsнιη, Jul 20 '17 at 18:53

Digital Trauma · Answer 1 · 2016-01-29T05:55:29.780

If you have the rs (reshape) utility available, you can do the following:

rs -Tzc: < input.txt

This gives the output format exactly as specified in the question, even down to the dynamic column widths.

-T Transposes the input data
-z sizes the columns appropriately from the max in each column
-c: uses colon as the input field separator

This works for arbitrarily sized tables, e.g.:

$ echo "Name:Alice:Bob:Carol
Age:12:34:56
Eyecolour:Brown:Black:Blue" | rs -Tzc: 
Name   Age  Eyecolour
Alice  12   Brown
Bob    34   Black
Carol  56   Blue
$

rs is available by default on OS X (and likely other BSD machines). It can be installed on Ubuntu (and debian family) with:

sudo apt-get install rs

Wildcard · Answer 2 · 2016-01-28T23:25:52.320

EDIT: Extensible to any number of output rows, in a simple one-liner for loop:

for ((i=1;i<=2;i++)); do cut -d: -f "$i" input | paste -sd: ; done | column -t -s:

Original answer:

You can do this as a one-liner using bash process substitution:

paste -sd: <(cut -d: -f1 input) <(cut -d: -f2 input) | column -t -s:

The -s option to paste makes it handle each file one at a time. The : delimiter set in paste is "caught" by the -s option to column at the end, to pretty up the format by making the fields line up.

The cut commands in the two process substitutions pull out the first field and the second field, respectively.

Whether there are blank lines in the input or not doesn't matter, as column -t -s: will clean up the output regardless. (There were blank lines in the original input specified in the question, but they've since been removed. The above command works regardless of blank lines.)

Input - contents of file named "input" in above command:

Virtual_Machine:OL6U7

ID:0004fb00000600003da8ce6948c441bd

Status:Running

Memory:65536

Uptime:17103

Server:MyOVS1.vmworld.com

Pool:HA-POOL

HA:false

VCPU:16

Type:Xen PVM

OS:Oracle Linux 6

Output:

Virtual_Machine  ID                                Status   Memory  Uptime  Server              Pool     HA     VCPU  Type     OS
OL6U7            0004fb00000600003da8ce6948c441bd  Running  65536   17103   MyOVS1.vmworld.com  HA-POOL  false  16    Xen PVM  Oracle Linux 6

This works for two output rows, but for more rows it becomes unwielding. — , Jan 28 '16 at 22:54

score 3 · Accepted Answer · 2016-01-29T17:10:24.830

If walking the file twice is not a (big) problem (will store only one line in memory):

awk -F : '{printf("%s\t ", $1)}' infile
echo
awk -F : '{printf("%s\t ", $2)}' infile

Which, for a general count of fields would be (which could have many walks of the file):

#!/bin/bash
rowcount=2
for (( i=1; i<=rowcount; i++ )); do
    awk -v i="$i" -F : '{printf("%s\t ", $i)}' infile
    echo
done

But for a really general transpose, this will work:

awk '$0!~/^$/{    i++;
                  split($0,arr,":");
                  for (j in arr) {
                      out[i,j]=arr[j];
                      if (maxr<j){ maxr=j} # max number of output rows.
                  }
            }
    END {
        maxc=i                             # max number of output columns.
        for     (j=1; j<=maxr; j++) {
            for (i=1; i<=maxc; i++) {
                printf( "%s\t", out[i,j])  # out field separator.
            }
            printf( "%s\n","" )
        }
    }' infile

And to make it pretty (using tab \t as out field separator) :

./script | |column -t -s $'\t'

Virtual_Machine  ID                                Status   Memory  Uptime  Server              Pool     HA     VCPU  Type     OS
OL6U7            0004fb00000600003da8ce6948c441bd  Running  65536   17103   MyOVS1.vmworld.com  HA-POOL  false  16    Xen PVM  Oracle Linux 6

The code above for a general transpose will store the whole matrix in memory.
That could be a problem for really big files.

Update for new text.

To process the new text posted in the question, It seems to me that two pass of awk are the best answer. One pass, as short as fields exist, will print the header field titles. The next awk pass will print only field 2. In both cases, I added a way to remove leading and trailing spaces (for better formatting).

#!/bin/bash
{
awk -F: 'BEGIN{ sl="Virtual Machine"}
         $1~sl && head == 1 { head=0; exit 0}
         $1~sl && head == 0 { head=1; }
         head == 1 {
             gsub(/^[ \t]+/,"",$1);   # remove leading  spaces
             gsub(/[ \t]+$/,"",$1);   # remove trailing spaces
             printf( "%s\t", $1)
         }
         ' infile
#echo
awk -F: 'BEGIN { sl="Virtual Machine"}
         $1~sl { printf( "%s\n", "") }
         {
             gsub(/^[ \t]+/,"",$2);   # remove leading  spaces
             gsub(/[ \t]+$/,"",$2);   # remove trailing spaces
             printf( "%s\t", $2)
         }
         ' infile
echo
} | column -t -s "$(printf '%b' '\t')"

The surrounding { ... } | column -t -s "$(printf '%b' '\t')" is to format the whole table in a pretty way.
Please note that the "$(printf '%b' '\t')" could be replaced with $'\t' in ksh, bash, or zsh.

jecxjo · Answer 4 · 2016-01-28T20:57:28.847

2

Using awk, store off the key and value and print them out in the end.

#!/usr/bin/awk -f
BEGIN {
  CNT=0
  FS=":"
}

{
  HDR[CNT]=$1;
  ENTRY[CNT]=$2;
  CNT++;
}

END {
  for (x = 0; x < CNT; x++)
    printf "%s\t",HDR[x]

  print""

  for (x = 0; x < CNT; x++)
    printf "%s\t",ENTRY[x]
  }

The just run awk -f ./script.awk ./input.txt

edited Jan 28 '16 at 20:57

answered Jan 28 '16 at 20:09

jecxjo

506

Changed the answer to be dynamic. Just requires there is only 1 VM worth of data per file. – jecxjo Jan 28 '16 at 20:58

score 2 · Answer 5 · answered Jan 28 '16 at 20:14

2

declare -a COLS
declare -a DATA
while IFS=':' read -ra fields; do
   COLS+=("${fields[0]}")
   DATA+=("${fields[1]}")
done < <( cat /path/to/input.txt)

HEADER=""
DATA=""
for i in $(seq 0 $((${#fields[@]}-1)); do
    HEADER="${HEADER}${COLS[$i]} "
    DATA="${DATA}${COLS[$i]} "
done
echo $HEADER
echo $DATA

answered Jan 28 '16 at 20:14

DopeGhoti

76,081

This look interesting, could you explain a little what the lines are doing? Could it be applied to this? – not2qubit Apr 29 '21 at 06:23

don_crissti · Answer 6 · 2016-01-28T21:50:23.777

With gnu datamash and column from util-linux:

datamash -t: transpose <infile | column -t -s:

This works with more than two columns but assumes there are no empty lines in your input file; with empty lines in between (like in your initial input sample) you would get an error like:

datamash: transpose input error: line 2 has 0 fields (previous lines had 2);

so to avoid that you'll have to squeeze them before processing with datamash:

tr -s \\n <infile | datamash -t: transpose | column -t -s:

Otherwise, in this particular case (only two columns), with zsh and the same column:

list=(${(f)"$(<infile)"})
printf %s\\n ${(j;:;)list[@]%:*} ${(j;:;)list[@]#*:} | column -t -s:

(${(f)"$(<infile)"}) reads the lines in an array; ${(j;:;)list[@]%:*} joins (with :) the first field of each element and ${(j;:;)list[@]#*:} joins (again with :) the second field of each element; these are both printed, e.g. the output is

Virtual_Machine:ID:Status:Memory:Uptime:Server:Pool:HA:VCPU:Type:OS
OL6U7:0004fb00000600003da8ce6948c441bd:Running:65536:17103:MyOVS1.vmworld.com:HA-POOL:false:16:Xen PVM:Oracle Linux 6

which is then piped to column -t -s:

MiniMax · Answer 7 · 2017-07-20T20:55:39.423

cat <(head -n 11 virtual.txt | cut -d: -f1) <(sed 's/.*: //' virtual.txt) | xargs -d '\n' -n 11 | column -t

The number of lines per Virtual Machine is hardcoded in this case - 11. Will be better count it beforehand and store in to the variable, then use this variable in the code.

Explanation

cat <(command 1) <(command 2) - <() construction makes command output appearing like a temporary file. Therefore, cat concatenates two files and pipes it further.
- command 1: head -n 11 virtual.txt | cut -d: -f1, gives us future column headers. The one Virtual Machine entry is first eleven lines, the head command is used to get it. The cut splits this entry to two columns and print the only first one.
- command 2: sed 's/.*: //' virtual.txt - gives us future column values. sed removes all unneeded text and leaves only values.
xargs -d '\n' -n 11. Each input item is terminated by newline. This command gets items and prints them by 11 per line.
column -t - is needed for pretty-printing displays. It displays our lines in a table form. Otherwise, each line will be different width.

Output

Virtual  Machine                           ID       Status  Memory  Uptime   Server             Pool         HA     Mode  VCPU  Type  OS
OL6U5    0004fb00000600003da8ce6948c441bb  Running  65536   17835   Minutes  MyOVS1.vmorld.com  HA-POOL      false  16    Xen   PVM   Oracle  Linux  6
OL6U6    0004fb00000600003da8ce6948c441bc  Running  65536   17565   Minutes  MyOVS2.vmorld.com  NON-HA-POOL  false  16    Xen   PVM   Oracle  Linux  6
OL6U7    0004fb00000600003da8ce6948c441bd  Running  65536   17835   Minutes  MyOVS1.vmorld.com  HA-POOL      false  16    Xen   PVM   Oracle  Linux  6

score 0 · Answer 8 · answered Aug 12 '17 at 08:16

Use datamash and its transpose option to swap rows and columns in a file.

datamash -t: transpose < infile.txt

By default, transpose verifies the input has the same number of fields in each line, and fails with an error otherwise and you can disable its strict mode to allow missing values by --no-strict

datamash -t: --no-strict transpose < infile.txt

Also you can use --filler to set the missing-field filler value:

datamash -t: --no-strict --filler " " transpose < infile.txt

_{derived from datamash manual}

score -5 · Answer 9 · answered Jan 28 '16 at 20:04

if your data is in separate files in a directory, you can use :

for file in $(ls $DIRECTORY)
do
  cat ${file} | while read line
  do
    value=$(echo $line | cut -d: -f2-)
    printf "%s\t" "${value}" >> bigfile
  done
  echo " " >> bigfile
done

you may need to massage the number of \t (tab) characters on the printf line if your variable values are of different lengths.

Convert rows to columns

9 Answers9

Update for new text.

Linked

Related