3

I basically want to do this:

tail -f trades.csv | csvtool readable -

I want to read a CSV file in a readable format using csvtool and I want to keep watching it.

I think that command doesn't work because tail -f never signals the end of the stream so csvtool is waiting indefinitely. Surely, there is a workaround for this common issue?

Thank you

3 Answers3

5

There is no such thing as “emitting EOF”. EOF is not an out-of-band signal. EOF is when an attempt to read reports that there is no data left to read.

If you pipe the output of tail -f into a program that reads the whole input before it starts emitting output, the program won't emit any output until it has read the complete input. And since tail -f never closes its output (since it never stops emitting output), that only happens once you kill the tail process.

csvtool readable reads all the input rows, then determines the width of each cell, calculates the maximum width of cells in each column, and finally emits all the rows with columns in a consistent width. It is impossible to perform this calculation until all the input is available, since the last row might be the one that has the widest cells. So it is logically impossible to design csvtool readable in a way that starts emitting output¹ before it has read all the input.

Maybe you don't care that all the rows have the same column widths. Maybe you just want mostly widths, that get enlarged if a wider row appears. This would be reasonable. But it isn't a feature that csvtool offers.

In many cases, “foo | bar doesn't emit output immediately when foo emits output gradually” is due to output buffering in foo. See Turn off buffering in pipe. This isn't what's happening here though. It could be the problem in different circumstances, for csvtool subcommands that don't require the whole input, with input coming from a program that does buffer its output.

If all you want is to convert commas in CSV to some column alignment, and you're willing to specify the column widths manually, here's a two-liner:

tail -f … | python3 -u -c 'import csv, sys 
for row in csv.reader(sys.stdin): print("\t".join(row))' | expand -t 11,13,17

You don't need the expand step if you're happy with the default tab stops every 8 columns that most terminals and editors use.

¹ For the nitpickers: beyond the first cell of the first row, which wouldn't help.

  • Ok so there is no way to get csvtool to keep reading the file. Is there another csv formatter that would let me do this? I just want to tail a CSV file basically – SpaceMonkey Oct 25 '22 at 21:43
  • @SpaceMonkey convert from comma to tab-separated, maybe? Not with csvtool though: csvtool cat reads the whole input, and I don't understand why. I've added a python two-liner to my answer. – Gilles 'SO- stop being evil' Oct 25 '22 at 21:52
  • this is a great comment, it inspired me to write a simple Python script to format it how I want.. – SpaceMonkey Oct 26 '22 at 18:42
1

Using Raku (formerly known as Perl_6)

Raku implements "CAP" programming architecture: Concurrency, Asynchrony, and Parallelism. Many aspects of "CAP" programming are advantageous for streaming data. Raku's JSON::Stream package can process streaming JSON data. However it is unclear if a true CSV parser written in Raku can take advantage of this architecture (yet).

If all you want to do is split lines (rows) on commas, the following code works. It implements a react/whenever block in Raku ("CAP" architecture). Below won't handle embedded newlines, commas embedded within doublequotes, but it's a start (also tested on /var/log/system.log):

~$ tail -n2 -f MS.csv | raku -e 'react {  \
                                 whenever Supply( $*IN.lines ) -> $ln {  \
                                 .split(",").raku.match(/^^ <-["]>+  <( \" .+ \" )>  <-["]>+ $$/).put for $ln } };'

Sample Input (from https://www.microsoft.com/en-us/download/details.aspx?id=45485):

User Name,First Name,Last Name,Display Name,Job Title,Department,Office Number,Office Phone,Mobile Phone,Fax,Address,City,State or Province,ZIP or Postal Code,Country or Region
chris@contoso.com,Chris,Green,Chris Green,IT Manager,Information Technology,123451,123-555-1211,123-555-6641,123-555-9821,1 Microsoft way,Redmond,Wa,98052,United States
ben@contoso.com,Ben,Andrews,Ben Andrews,IT Manager,Information Technology,123452,123-555-1212,123-555-6642,123-555-9822,1 Microsoft way,Redmond,Wa,98052,United States
david@contoso.com,David,Longmuir,David Longmuir,IT Manager,Information Technology,123453,123-555-1213,123-555-6643,123-555-9823,1 Microsoft way,Redmond,Wa,98052,United States
cynthia@contoso.com,Cynthia,Carey,Cynthia Carey,IT Manager,Information Technology,123454,123-555-1214,123-555-6644,123-555-9824,1 Microsoft way,Redmond,Wa,98052,United States
melissa@contoso.com,Melissa,MacBeth,Melissa MacBeth,IT Manager,Information Technology,123455,123-555-1215,123-555-6645,123-555-9825,1 Microsoft way,Redmond,Wa,98052,United States

Sample Output (only last 2 lines processed, via tail -n2 -f):

"cynthia\@contoso.com", "Cynthia", "Carey", "Cynthia Carey", "IT Manager", "Information Technology", "123454", "123-555-1214", "123-555-6644", "123-555-9824", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"
"melissa\@contoso.com", "Melissa", "MacBeth", "Melissa MacBeth", "IT Manager", "Information Technology", "123455", "123-555-1215", "123-555-6645", "123-555-9825", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"

Above, to receive unquoted output simply use .put, removing the intervening call to:

.raku.match(/^^ <-["]>+  <( \" .+ \" )>  <-["]>+ $$/)

Nota bene: I've tried Raku's Text::CSV module to see it will work with Raku's react/whenever block, but so far no luck. The best I can do is implement a while block, which is an okay solution if you're just feeding it tailed input. Code as follows:

~$ tail -n2 -f MS.csv | raku -MText::CSV -e 'my @rows;  \
                                             my $csv = Text::CSV.new;  \
                                             while ($csv.getline($*IN)) -> $row {  \
                                             @rows.push: $row; say @rows[*-1].raku; };'

Sample Output:

$["cynthia\@contoso.com", "Cynthia", "Carey", "Cynthia Carey", "IT Manager", "Information Technology", "123454", "123-555-1214", "123-555-6644", "123-555-9824", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
$["melissa\@contoso.com", "Melissa", "MacBeth", "Melissa MacBeth", "IT Manager", "Information Technology", "123455", "123-555-1215", "123-555-6645", "123-555-9825", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]

[Above, remove the call to .raku to get unquoted output, or append a call to .match(/^^ <-["]>+ <( \" .+ \" )> <-["]>+ $$/) to retain double quotes, dropping extraneous characters at the start/end of the line].

The code above pushes incoming data onto the @rows array, in case you want to do something with it. Most importantly because Text::CSV is a true CSV parser, you can validate CSV input. And since the input is validated-CSV, you can directly output columns, or number of elements-per-row, etc. For example, replace the last statement say @rows[*-1] with say @rows[*-1][2] to receive a continuous output of the third column.

See the URLs below for sep-char, escape-char, formula-handling, binary, strict settings, etc.

https://raku.land/github:Tux/Text::CSV
https://github.com/Tux/CSV
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17
-1

If csvtool requires EOF - you are out of luck.

But if the issue is with pipe buffering, then one of these two could help:

$ unbuffer tail -f trades.csv | csvtool readable -

$ stdbuf -i0 -o0 -e0 tail -f trades.csv | csvtool readable -

White Owl
  • 5,129
  • 1
    This is irrelevant. tail doesn't buffer its output (at least not the GNU implementation). And even if it did, it wouldn't help with csvtool readable. – Gilles 'SO- stop being evil' Oct 25 '22 at 20:47
  • @Gilles'SO-stopbeingevil' I find that tail -f file | grep something and stdbuf -oL tail -f file | grep something work very differently. My conclusion is that tail -f does buffer its non-terminal output (ie. usual stdio effect) – Chris Davies Oct 25 '22 at 21:17
  • @Gilles'SO-stopbeingevil' Buffer not tail, but pipe. And as I said from a very beginning, the csvtool could require end of input to determine width of the column based on the last row. But it not necessarily true. The csvtool can grow column width gradually, with each new row - it is possible, and I saw many tools which work like that. – White Owl Oct 25 '22 at 21:28