19

I'm editing a simple table. I would like to have it nicely formatted. While I could use tbl, latex, or similar, this seems overkill -- plain text really is sufficient. As it's simple I might as well have the source be the output. So the source should look good too. This seems like it should be a perfect job for column -s '|' -t -- it finds the separators and automatically inserts spaces to align according to the maximum width in each column. Unfortunately, it deletes the separators, so I can't rerun it after further editing. Is there any good text-processing tool that can do this idempotently, so that it's output serves as input? Or do I need to write my own?

EDIT: here's an example of what I want:

foo |   bar | baz
abc def | 12 | 23456

should become

foo     | bar | baz
abc def | 12  | 3456

When ' ' is both the separator and the spacer, column -t works nicely. But my items have spaces in them, so I can't use that. Having the spacers be distinct from the separators complicates things. I think it's useful to have them be treated as separator characters when next to separators, but that's not what column -s '|' -t does (though obviously the current behavior is also useful).

wnoise
  • 1,961
  • You could use emacs org-mode. The table support is actually quite amazing, providing spreadsheet like functionality. – vschum Aug 14 '11 at 09:41
  • Not as general as what I thought would be reasonable, but there's a python program specifically for markdown tables at http://www.leancrew.com/all-this/2008/08/tables-for-markdown-and-textmate/ . – wnoise Aug 14 '11 at 11:11
  • This is a problem I run into like at least every two weeks. The only viable solution to bypass printf holocaust each time, that I have found so far, is adding a unique char (like @) into the data, and use ... | column -s@ -t afterwards. – sjas Nov 03 '16 at 10:11

6 Answers6

20

Not sure if I understand right what is your problem. But, can it be solved adding an additional temporal separator? hence you can use the second separator to mark the separations, keeping the original separator untouched.

See this example where I add a "@" to each of the "|" so the input of the column command would be "xxx @| yyyy". Column will process the "@" keeping the "|" untouched:

~$ echo "foo | this is some text | bar" | sed 's/|/@|/g'  | column -s '@' -t
foo   | this is some text   | bar
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
hmontoliu
  • 1,947
  • 12
  • 11
  • Clever. Nearly does what I want, and does in fact do what I asked -- leaves the separators in. I also want the spaces next to the true separators to be able to be adjusted down, rather than just up, as here. – wnoise Aug 14 '11 at 09:32
  • @wnoise: use sed 's/ *| */@| /g' instead – Stéphane Gimenez Aug 14 '11 at 10:45
  • @Stéphane Gimenez: And adding sed 's/ |/|/g' after the column fixes the extra spaces added. We now have a solution that works well enough for me. (Though it'd be nice if it didn't depend on an extra character like this. What if one isn't available?) – wnoise Aug 14 '11 at 11:02
  • 4
    @wnoise: Instead of @, you can use something that typically does't appear in text, like a low ASCII value, eg. $'\x01'... (but not $'\x00')... – Peter.O Aug 14 '11 at 14:13
20

This wasn't available when you asked the question but as of v. 2.23 column from util-linux allows you to select the output separator via

   -o, --output-separator string
          Specify the columns delimiter for table output (default is two spaces).

So simply run:

 column -s '|' -o '|' -t infile
don_crissti
  • 82,805
  • 3
    Note that the util-linux version is not available on Ubuntu 18.04 (and probably other Debain derived distros) at the time of writing. Only the bsdmainutils version is available. The bsdmainutils version does not support output formatting. – htaccess Dec 03 '18 at 23:56
  • (and in more layman English - the Mac OS version of column doesn't have the -o version :( ) – Sridhar Sarnobat Jan 03 '24 at 03:35
6

Here is a bash script. It does not use 'column -t`, and the seperator is handled exactly as is the IFS, because it is the IFS (or at least, awk's internal version of the IFS)... The default delimiter is $' \t'

This script fully pads out the rightmost field.
'column' does not do this.
By padding out all the columns, this script can be
easily modified to create a table frame as well.

Note. The input file needs to be processed twice
('column' would also need to do this)
The first pass is to get column max widths.
The second pass is to expand fields (per column)

Added some options and fixed a glaring bug (renaming variables :(

  • -l Left trim whitespace of any indented fields
  • -r Right trim whitespace wider than widest text (for the column)
  • -b Both -l and -r
  • -L Left output delimiter is added
  • -R Right output delimiter is added
  • -B Both -L and -R
  • -S Choose output seperator

#!/bin/bash
#
#   script [-F sep] [file]
#
#   If file is not specified, stdin is read 
#    
# ARGS ######################################################################
l=;r=;L=;R=;O=;F=' ' # defaults
for ((i=1;i<=${#@};i++)) ;do
  case "$1" in
    -- ) shift 1;((i--));break ;;
    -l ) l="-l";shift 1;((i-=1)) ;;        #  left strip whitespace
    -r ) r="-r";shift 1;((i-=1)) ;;        # right strip whitespace
    -b ) l="-l";r="-r";shift 1;((i-=1)) ;; # strip  both -l and -r whitespace
    -L ) L="-L";shift 1;((i-=1)) ;;        #  Left output delimiter is added
    -R ) R="-R";shift 1;((i-=1)) ;;        # Right output delimiter is added
    -B ) L="-L";R="-R";shift 1;((i-=1)) ;; # output Both -L and -R delimiters
    -F ) F="$2";shift 2;((i-=2)) ;; # source separator
    -O ) O="$2";shift 2;((i-=2)) ;; # output  separator. Default = 1st char of -F 
    -* ) echo "ERROR: invalid option: $1" 1>&2; exit 1 ;;
     * ) break ;;
  esac
done
#
if  [[ -z "$1" ]] ;then # no filename, so read stdin
  f="$(mktemp)"
  ifs="$IFS"; IFS=$'\n'; set -f # Disable pathname expansion (globbing)
  while read -r line; do
    printf "%s\n" "$line" >>"$f"
  done
  IFS="$ifs"; set +f # re-enable pathname expansion (globbing)
else
  f="$1"
fi
[[ -f "$f" ]] || { echo "ERROR: Input file NOT found:" ;echo "$f" ;exit 2 ; }
[[ -z "$F" ]] && F=' '        # input Field Separator string
[[ -z "$O" ]] && O="$F"       # output Field Separator
                 O="${O:0:1}" #   use  single char only

# MAIN ######################################################################
max="$( # get max length of each field/column, and output them
  awk -vl="$l" -vr="$r" -vL="$L" -vR="$R" -vF="$F" -vO="$O" '
    BEGIN { if (F!="") FS=F }
    { for (i=1;i<=NF;i++) { 
        if (l=="-l") { sub("^[ \t]*","",$i) }
        if (r=="-r") { sub("[ \t]*$","",$i) }
        len=length($i); if (len>max[i]) { max[i]=len } 
        if (i>imax) { imax=i } 
      } 
    }
    END { for(i=1;i<=imax;i++) { printf("%s ",max[i]) } }
  ' "$f" 
)"

awk -vl="$l" -vr="$r" -vL="$L" -vR="$R" -vF="$F" -vO="$O" -v_max="$max" '
  BEGIN { if (F!="") FS=F; cols=split(_max,max," ") }
  { # Bring each field up to max len and output with delimiter
    printf("%s",L=="-L"?O:"")
    for(i=1;i<=cols;i++) { if (l=="-l") { sub("^[ \t]*","",$i) } 
                           if (r=="-r") { sub("[ \t]*$","",$i) }
      printf("%s%"(max[i]-length($i))"s%s",$i,"",i==cols?"":O) 
    } 
    printf("%s\n",R=="-R"?O:"")
  }
' "$f"

# END #######################################################################    
if  [[ -z "$1" ]] ;then # no filename, so stdin was used
  rm "$f"   # delete temp file
fi
exit
manatwork
  • 31,277
Peter.O
  • 32,916
  • Nicely done. Of course, I was hoping for something that wouldn't actually require writing a new program. – wnoise Aug 22 '11 at 09:58
3

Take a look at the vim plugin called Tabularize

:Tabularize /<delim>
1

This is a two-pass tweak on hmontoliu's answer, which avoids needing to hard code the delimiter, by guessing it from the input data.

  1. parse input for single non-alphanumeric characters surrounded by spaces, sort them by which is most common, and assume the most common character is the delimiter, which is assigned to $d.
  2. proceed more or less as in hmonoliu's answer, but use an ASCII NULL as padding, instead of an @, as per PeterO's comment.

The code is a function which accepts a filename, or else input from STDIN:

algn() { 
    d="$(grep -ow '[^[:alnum:]]' "${1:-/dev/stdin}"  | \
         sort | uniq -c | sort -rn | sed -n '1s/.*\(.$\)/\1/p')" ;
    sed "s/ *$d */\x01$d /g" "${1:-/dev/stdin}"  | column -s $'\001' -t ;
}

Output of algn foo (or also algn < foo):

foo      | bar  | baz
abc def  | 12   | 23456
agc
  • 7,223
  • Looking at this a year later, it seems like the STDIN invocation can't and shouldn't work because it uses up STDIN twice. Testing with large files (about 80 million lines) indicates it apparently works correctly. Hmm... – agc Feb 23 '18 at 18:33
0

Used idea of hmontoliu to implement simple command:

#! /bin/bash
delim="${1:-,}"
interm="${2:-\~}"
sed "s/$delim/$interm$delim/g" | column -t -s "$interm" | sed "s/  $delim/$delim/g"

Comment:

  • ${1:-,} - is a first argument with , as default
  • the first sed inserts a intermediate symbol ($interm 2nd argument or ~ by default)
  • then column replaces intermediate symbol with spaces that do alignment
  • the second sed cleans up the redundant spaces after column command

Usage example:

$ echo "
a: bb: cccc
aaaa: b : cc
" | align :

a   : bb: cccc
aaaa: b : cc

It's also good in that it's idempotent: you can apply it several times and get the same result (for example when you edit in vim and realign).

Alexey
  • 187