2

Is there a way to do column-like filtering on data comming through a pipe?

I am looking for a way how to do a similar thing that column -t does, but without waiting for the input to end so it would work for real-time data comming through a pipe. I know I could force fixed-width columns by awk, but that needs too much setup every time the format changes.

PS: I do not think mimicking column -x on incomplete data is possible. I also think column -t is impossible to replicate perfectly on incomplete data. It is OK if the solution outputs narrower columns at first and then expand them as more lines arrive.

EDIT: Example to illustrate this is NOT a buffering problem:

yes something | cat -n | tr -s '\t' ' ' | column -t
  • 1
    which mode of column do you intend to mimick, the default, the -x or the -t mode? – Marcus Müller Jun 28 '22 at 16:49
  • 1
    Please [edit] your question and add a few lines of example input, the expected output and the column command line options you would use. – Bodo Jun 28 '22 at 16:51
  • Please [edit] your question and show the comlpete pipe you use. The issue might be related to buffering. When your program, that prints the data, pipes its output into column (or any other program) instead of sending it to a terminal, the output will be buffered instead of line-buffered. This means column (or any replacement) will get the data in bigger chunks, not line-by-line. If the output of the original program is slow (e.g. a line per second), this will result in a delayed output to the terminal. – Bodo Jun 28 '22 at 17:45
  • @Bodo: Edited. I definitely need -t, because -x makes no sense for real-time data. I am not interested in -c. Options -s and -o would be nice, but let's keep it simple. – user185953 Jun 30 '22 at 09:35
  • Does this answer your question? Turn off buffering in pipe - particularly the stdbuf solution (currently 550 votes) – Chris Davies Jun 30 '22 at 09:46
  • 1
    column -t needs to read all the lines to know the maximum width of the columns before it can start outputting anything. If you know the number and widths of columns in advance, you can use things like expand or awk's printf(). – Stéphane Chazelas Jun 30 '22 at 10:30
  • @StéphaneChazelas Agreed, I cannot expect a perfect output. I clarified the question. – user185953 Jun 30 '22 at 12:46

1 Answers1

3

The whole point of column -t is that it aligns fields in columns automatically based on the maximum with of each field in each column.

If your input contains

a b
a  b
a bc

It will output:

a  b
a  b
a  bc

If you add a:

xxxxx b

line to the input, the output becomes:

a      b
a      b
a      bc
xxxxx  b

column needs to read all the lines of input to determine the width of each column and can't start outputting anything before then.

Your only way to work around that is if you know or can guess the maximum width of each column.

For instance, if you know fields are never larger than 10 cells, you can do:

<input tr -s '[:blank:]' '[\t*]' | expand -t 12

To format the output in 12-cell large columns.

(beware some tr implementations including GNU tr don't support multi-byte characters and some expand implementations including GNU expand support neither multi-byte characters nor zero-width or double-width ones)

For a solution with columns whose widths adapt dynamically to the width of new input, you could do something like:

perl -Mopen=locale -MText::CharWidth=mbswidth -lae '
  for (0..$#F) {$l = mbswidth$F[$_]; $l[$_] = $l if $l > $l[$_]}
  print((map {sprintf "%-$l[$_]s  ", $F[$_]} (0..$#F-1)), $F[$#F])'

For instance, on the output of lorem -p 2 | fmt -w 40, that gives:

Rerum  aut  pariatur  nihil
modi.  Exercitationem  ut
animi.  Quibusdam       dolores
voluptates  pariatur        vel
tempora.    Adipisci        expedita  voluptate
dolores     qui             consequatur.  Laboriosam
eum         ea.             Quasi         ab            qui  harum
repudiandae  consequatur     quasi

Nobis quia nesciunt laudantium. enim exercitationem earum. Pariatur nesciunt maiores natus nemo delectus. Ut ad voluptatem. Consequatur sint enim sequi aut est nihil. Et at

Or to reformat only the first 3 columns:

perl -Mopen=locale -MText::CharWidth=mbswidth -lne '
  @F = /(\S+)\s+(\S+)\s+(\S+)\s*(.*)/;
  for (0..$#F) {$l = mbswidth$F[$_]; $l[$_] = $l if $l > $l[$_]}
  print((map {sprintf "%-$l[$_]s  ", $F[$_]} (0..$#F-1)), $F[$#F])'

Giving:

Sit  earum  voluptatem  cum adipisci aut
commodi.  Quia   aut         eaque rerum nihil
aperiam.  Dolor  quia        illo et. Quasi
illum     est    aliquam     consequatur maiores
voluptatibus.  Optio  consectetur  aliquid

Aspernatur omnis ex dolor nemo delectus sit quia ut. Voluptatum voluptatibus suscipit vel quos. Quo a at et non cumque voluptate dolorum nostrum. Eos ex est deleniti necessitatibus assumenda provident culpa. Ut sed et labore ullam voluptatum impedit. Tempora delectus et rem dicta debitis odit dignissimos.

  • I clarified the quesiton - please take a look. But: Thank you very much. I did not know about the expand command and it's ability to make fixed-width columns for ASCII input. – user185953 Jun 30 '22 at 12:51
  • @user185953, note that the ASCII (well single-byte / single-width, there's nothing ASCII-specific there) limitation does not affect all implementations. On BSDs (and your column is likely the one from BSD), expand works fine with non-ASCII character, multi-byte and zero/double-width characters. – Stéphane Chazelas Jun 30 '22 at 13:04
  • @user185953, see edit for a solution with adapting column width. – Stéphane Chazelas Jun 30 '22 at 13:40
  • That looks like the desired output. I don't speak perl very well - did you program that whole algorithm yourself there? If yes, is it easy to change so it formats only first N columns and pass rest of line as-is? But anyway: Thank you a lot, mark of Answer is yours. – user185953 Jun 30 '22 at 14:09
  • @user185953, see edit. – Stéphane Chazelas Jun 30 '22 at 14:50