How to process an x-column text file to get a y-column one?

Question

I have a text file:

a   aa  aaa     b   bb  bbb     c   cc  ccc
d   dd  ddd     e   ee  eee     f   ff  fff
g   gg  ggg     h   hh  hhh     i   ii  iii
j   jj  jjj

How can I process it and get a 2 column file like this:

a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

Or a three column file like this:

a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jj

I prefer to get awk solution but other solutions are welcomed too.

Thor · Accepted Answer · 2016-10-23T14:43:23.580

20

Put each field on a line and post-columnate.

Each field on one line

tr

tr -s ' ' '\n' < infile

grep

grep -o '[[:alnum:]]*' infile

sed

sed 's/\s\+/\n/g' infile

or more portable:

sed 's/\s\+/\
/g' infile

awk

awk '$1=$1' OFS='\n' infile

or

awk -v OFS='\n' '$1=$1' infile

Columnate

paste

For 2 columns:

... | paste - -

For 3 columns:

... | paste - - -

etc.

sed

For 2 columns:

... | sed 'N; s/\n/\t/g'

For 3 columns:

... | sed 'N; N; s/\n/\t/g'

etc.

xargs

... | xargs -n number-of-desired-columns

As xargs uses /bin/echo to print, beware that data that looks like options to echo will be interpreted as such.

awk

... | awk '{ printf "%s", $0 (NR%n==0?ORS:OFS) }' n=number-of-desired-columns OFS='\t'

pr

... | pr -at -number-of-desired-columns

or

... | pr -at -s$'\t' -number-of-desired-columns

columns (from the autogen package)

... | columns -c number-of-desired-columns

Typical output:

a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj

edited Oct 23 '16 at 14:43

answered Sep 08 '16 at 13:05

Thor

17,182

2

Slam dunk. +1 sir – Zombo Sep 09 '16 at 13:35
Shouldn't the xargs line call echo or printf? – Wildcard Sep 09 '16 at 15:59
1

@Wildcard: xargs calls /bin/echo by default – Thor Sep 09 '16 at 16:07
1

Wow, I had no idea! It's even specified by POSIX. Thanks! – Wildcard Sep 09 '16 at 16:16
@Wildcard: Sending data to xargs that looks like options to /bin/echo causes problems ... I added a warning. – Thor Sep 12 '16 at 02:46

Eric Renouf · Answer 2 · 2016-09-09T16:58:33.640

9

As Wildcard pointed out, this will only work if your file is nicely formatted, in that there aren't any special characters that the shell will interpret as globs and you are happy with the default word splitting rules. If there's any question about whether your files will "pass" that test, do not use this approach.

One possibility would be to use printf to do it like

printf '%s\t%s\n' $(cat your_file)

That will do word splitting on the contents of your_file and will pair them and print them with tabs in between. You could use more %s format strings in the printf to have extra columns.

edited Sep 09 '16 at 16:58

answered Sep 08 '16 at 11:26

Eric Renouf

18,431

1

The depends on the file containing no special characters. If it has, for instance, any asterisks (*) you will get very unexpected results. – Wildcard Sep 09 '16 at 06:34

score 9 · Answer 3 · answered Sep 08 '16 at 11:51

$ sed -E 's/\s+/\n/g' ip.txt | paste - -
a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

$ sed -E 's/\s+/\n/g' ip.txt | paste - - -
a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj

score 4 · Answer 4 · answered Sep 09 '16 at 15:25

4

perl -n0E 'say s/\s+/ ++$n % 4 ?"\t":"\n"/gre' file

(replace 4 by the number of columns)

answered Sep 09 '16 at 15:25

JJoao

12,170
1
23
45

score 4 · Answer 5 · answered Sep 12 '16 at 02:54

BSD rs (reshape) utility:

$ rs 0 2
a   aa  aaa     b   bb  bbb     c   cc  ccc
d   dd  ddd     e   ee  eee     f   ff  fff
g   gg  ggg     h   hh  hhh     i   ii  iii
j   jj  jjj
[Ctrl-D][Enter]
a    aa
aaa  b
bb   bbb
c    cc
ccc  d
dd   ddd
e    ee
eee  f
ff   fff
g    gg
ggg  h
hh   hhh
i    ii
iii  j
jj   jjj

0 2 is rows and columns. Specifying 0 means "calculate rows automatically from columns".

score 3 · Answer 6 · answered Sep 09 '16 at 08:16

Python script approach.

Basic idea here is to flatten all the words in your text into one list, and then print new-line after each second item (that's for columnating in to two columns). If you want 3 columns , change index%2 to index%3

#!/usr/bin/env python3
import sys

items = [i for l in sys.stdin 
           for i in l.strip().split()]
line = []
for index,item in enumerate(items,1):
    line.append(item)
    if index%2 == 0:
       print("\t".join(line))
       line = []

Sample output:

$ python recolumnate.py < input.txt                                            
a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

Three-column version (as said above, only index%3 == 0 changed)

$ cat recolumnate.py                                                           
#!/usr/bin/env python3
import sys

items = [i for l in sys.stdin 
           for i in l.strip().split()]
line = []
for index,item in enumerate(items,1):
    line.append(item)
    if index%3 == 0:
       print("\t".join(line))
       line = []

$ python recolumnate.py < input.txt                                            
a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj

Thor · Answer 7 · 2016-10-23T14:38:50.690

1

You can also do it with a single invocation of GNU awk:

reshape.awk

# Set awk to split input at whitespace characters and
# use tab as the output field separator 
BEGIN {
  RS="[ \t\n]+"
  OFS="\t"
}

# Print using OFS or ORS based on the element index
{
  printf "%s", $1 (NR%n == 0 ? ORS : OFS)
}

# Append a missing new-line when last row is not full
END { 
  if( NR%n != 0) 
    printf "\n"
}

Run it like this:

awk -f reshape.awk n=2 infile

Or as a one-liner:

awk -v n=2 'BEGIN { RS="[ \t\n]+"; OFS="\t" } { printf "%s", $1 (NR%n == 0 ? ORS : OFS) } END { if( NR%n != 0) printf "\n" }' infile

Output:

a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

Or with n=3:

a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj

edited Oct 23 '16 at 14:38

answered Sep 09 '16 at 15:06

Thor

17,182

Doesn't this use $1 as the format string to printf? – Wildcard Sep 09 '16 at 16:02
@Wildcard: Right, it is safer to use "%s", .... Updated – Thor Sep 09 '16 at 16:10
Thanks for confirming. :) The same applies to the awk command in your other answer to this question, by the way. – Wildcard Sep 09 '16 at 16:16

score 1 · Answer 8 · answered Oct 17 '21 at 16:52

Two columns

perl -pne "s/ /\n/g" filename| sed '/^$/d'| sed "N;s/\n/ /g"

output

a aa
aaa b
bb bbb
c cc
ccc d
dd ddd
e ee
eee f
ff fff
g gg
ggg h
hh hhh
i ii
iii j
jj jjj

Three columns

    perl -pne "s/ /\n/g" filename| sed '/^$/d'| sed "1~3N;s/\n/ /g"| sed "N;s/\n/ /g"
a aa aaa
b bb bbb
c cc ccc
d dd ddd
e ee eee
f ff fff
g gg ggg
h hh hhh
i ii iii
j jj jjj

score 0 · Answer 9 · answered Oct 17 '23 at 15:36

Using Raku (formerly known as Perl_6)

Converting to 2-column output:

~$ raku -e '.put for words.batch(2);'   file

Converting to 3-column output:

~$ raku -e '.put for words.batch(3);'   file

Raku has a words function that splits on whitespace. Once split the elements can be batched back together. You use batch in Raku if you anticipate an incomplete/partial set of elements at the end (equivalent to rotor(partial => True) ). If you have the need to drop the final incomplete/partial set of elements at the end, use rotor() with defaults.

Sample Input:

a   aa  aaa     b   bb  bbb     c   cc  ccc
d   dd  ddd     e   ee  eee     f   ff  fff
g   gg  ggg     h   hh  hhh     i   ii  iii
j   jj  jjj

Sample Output (2-column joined on \t):

~$ raku -e '.join("\t").put for words.batch(2);'  file
a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

Sample Output (3-column joined on \t):

~$ raku -e '.join("\t").put for words.batch(3);'  file
a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj

Finally, if your initial elements are not whitespace-separated, you can slurp the file in all at once and for example split(/ \, | \n) split on commas and newlines. See the first link below for an example.

https://unix.stackexchange.com/a/686651/227738
https://raku.org