17

I have a text file:

a   aa  aaa     b   bb  bbb     c   cc  ccc
d   dd  ddd     e   ee  eee     f   ff  fff
g   gg  ggg     h   hh  hhh     i   ii  iii
j   jj  jjj

How can I process it and get a 2 column file like this:

a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

Or a three column file like this:

a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jj

I prefer to get awk solution but other solutions are welcomed too.

αғsнιη
  • 41,407

9 Answers9

20

Put each field on a line and post-columnate.

Each field on one line

tr

tr -s ' ' '\n' < infile

grep

grep -o '[[:alnum:]]*' infile

sed

sed 's/\s\+/\n/g' infile

or more portable:

sed 's/\s\+/\
/g' infile

awk

awk '$1=$1' OFS='\n' infile

or

awk -v OFS='\n' '$1=$1' infile

Columnate

paste

For 2 columns:

... | paste - -

For 3 columns:

... | paste - - -

etc.

sed

For 2 columns:

... | sed 'N; s/\n/\t/g'

For 3 columns:

... | sed 'N; N; s/\n/\t/g'

etc.

xargs

... | xargs -n number-of-desired-columns

As xargs uses /bin/echo to print, beware that data that looks like options to echo will be interpreted as such.

awk

... | awk '{ printf "%s", $0 (NR%n==0?ORS:OFS) }' n=number-of-desired-columns OFS='\t'

pr

... | pr -at -number-of-desired-columns

or

... | pr -at -s$'\t' -number-of-desired-columns

columns (from the autogen package)

... | columns -c number-of-desired-columns

Typical output:

a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj
Thor
  • 17,182
9

As Wildcard pointed out, this will only work if your file is nicely formatted, in that there aren't any special characters that the shell will interpret as globs and you are happy with the default word splitting rules. If there's any question about whether your files will "pass" that test, do not use this approach.

One possibility would be to use printf to do it like

printf '%s\t%s\n' $(cat your_file)

That will do word splitting on the contents of your_file and will pair them and print them with tabs in between. You could use more %s format strings in the printf to have extra columns.

Eric Renouf
  • 18,431
  • 1
    The depends on the file containing no special characters. If it has, for instance, any asterisks (*) you will get very unexpected results. – Wildcard Sep 09 '16 at 06:34
9
$ sed -E 's/\s+/\n/g' ip.txt | paste - -
a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

$ sed -E 's/\s+/\n/g' ip.txt | paste - - -
a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj
Sundeep
  • 12,008
4
perl -n0E 'say s/\s+/ ++$n % 4 ?"\t":"\n"/gre' file

(replace 4 by the number of columns)

JJoao
  • 12,170
  • 1
  • 23
  • 45
4

BSD rs (reshape) utility:

$ rs 0 2
a   aa  aaa     b   bb  bbb     c   cc  ccc
d   dd  ddd     e   ee  eee     f   ff  fff
g   gg  ggg     h   hh  hhh     i   ii  iii
j   jj  jjj
[Ctrl-D][Enter]
a    aa
aaa  b
bb   bbb
c    cc
ccc  d
dd   ddd
e    ee
eee  f
ff   fff
g    gg
ggg  h
hh   hhh
i    ii
iii  j
jj   jjj

0 2 is rows and columns. Specifying 0 means "calculate rows automatically from columns".

Kaz
  • 8,273
3

Python script approach.

Basic idea here is to flatten all the words in your text into one list, and then print new-line after each second item (that's for columnating in to two columns). If you want 3 columns , change index%2 to index%3

#!/usr/bin/env python3
import sys

items = [i for l in sys.stdin 
           for i in l.strip().split()]
line = []
for index,item in enumerate(items,1):
    line.append(item)
    if index%2 == 0:
       print("\t".join(line))
       line = []

Sample output:

$ python recolumnate.py < input.txt                                            
a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

Three-column version (as said above, only index%3 == 0 changed)

$ cat recolumnate.py                                                           
#!/usr/bin/env python3
import sys

items = [i for l in sys.stdin 
           for i in l.strip().split()]
line = []
for index,item in enumerate(items,1):
    line.append(item)
    if index%3 == 0:
       print("\t".join(line))
       line = []

$ python recolumnate.py < input.txt                                            
a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj
1

You can also do it with a single invocation of GNU awk:

reshape.awk

# Set awk to split input at whitespace characters and
# use tab as the output field separator 
BEGIN {
  RS="[ \t\n]+"
  OFS="\t"
}

# Print using OFS or ORS based on the element index
{
  printf "%s", $1 (NR%n == 0 ? ORS : OFS)
}

# Append a missing new-line when last row is not full
END { 
  if( NR%n != 0) 
    printf "\n"
}

Run it like this:

awk -f reshape.awk n=2 infile

Or as a one-liner:

awk -v n=2 'BEGIN { RS="[ \t\n]+"; OFS="\t" } { printf "%s", $1 (NR%n == 0 ? ORS : OFS) } END { if( NR%n != 0) printf "\n" }' infile

Output:

a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

Or with n=3:

a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj
Thor
  • 17,182
1

Two columns

perl -pne "s/ /\n/g" filename| sed '/^$/d'| sed "N;s/\n/ /g"

output

a aa
aaa b
bb bbb
c cc
ccc d
dd ddd
e ee
eee f
ff fff
g gg
ggg h
hh hhh
i ii
iii j
jj jjj

Three columns

    perl -pne "s/ /\n/g" filename| sed '/^$/d'| sed "1~3N;s/\n/ /g"| sed "N;s/\n/ /g"

a aa aaa b bb bbb c cc ccc d dd ddd e ee eee f ff fff g gg ggg h hh hhh i ii iii j jj jjj

0

Using Raku (formerly known as Perl_6)

Converting to 2-column output:

~$ raku -e '.put for words.batch(2);'   file

Converting to 3-column output:

~$ raku -e '.put for words.batch(3);'   file

Raku has a words function that splits on whitespace. Once split the elements can be batched back together. You use batch in Raku if you anticipate an incomplete/partial set of elements at the end (equivalent to rotor(partial => True) ). If you have the need to drop the final incomplete/partial set of elements at the end, use rotor() with defaults.

Sample Input:

a   aa  aaa     b   bb  bbb     c   cc  ccc
d   dd  ddd     e   ee  eee     f   ff  fff
g   gg  ggg     h   hh  hhh     i   ii  iii
j   jj  jjj

Sample Output (2-column joined on \t):

~$ raku -e '.join("\t").put for words.batch(2);'  file
a   aa
aaa b
bb  bbb
c   cc
ccc d
dd  ddd
e   ee
eee f
ff  fff
g   gg
ggg h
hh  hhh
i   ii
iii j
jj  jjj

Sample Output (3-column joined on \t):

~$ raku -e '.join("\t").put for words.batch(3);'  file
a   aa  aaa
b   bb  bbb
c   cc  ccc
d   dd  ddd
e   ee  eee
f   ff  fff
g   gg  ggg
h   hh  hhh
i   ii  iii
j   jj  jjj

Finally, if your initial elements are not whitespace-separated, you can slurp the file in all at once and for example split(/ \, | \n) split on commas and newlines. See the first link below for an example.

https://unix.stackexchange.com/a/686651/227738
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17