Using Raku (formerly known as Perl_6)
Raku implements "CAP" programming architecture: Concurrency, Asynchrony, and Parallelism. Many aspects of "CAP" programming are advantageous for streaming data. Raku's JSON::Stream package can process streaming JSON data. However it is unclear if a true CSV parser written in Raku can take advantage of this architecture (yet).
If all you want to do is split lines (rows) on commas, the following code works. It implements a react/whenever block in Raku ("CAP" architecture). Below won't handle embedded newlines, commas embedded within doublequotes, but it's a start (also tested on /var/log/system.log):
~$ tail -n2 -f MS.csv | raku -e 'react { \
whenever Supply( $*IN.lines ) -> $ln { \
.split(",").raku.match(/^^ <-["]>+ <( \" .+ \" )> <-["]>+ $$/).put for $ln } };'
Sample Input (from https://www.microsoft.com/en-us/download/details.aspx?id=45485):
User Name,First Name,Last Name,Display Name,Job Title,Department,Office Number,Office Phone,Mobile Phone,Fax,Address,City,State or Province,ZIP or Postal Code,Country or Region
chris@contoso.com,Chris,Green,Chris Green,IT Manager,Information Technology,123451,123-555-1211,123-555-6641,123-555-9821,1 Microsoft way,Redmond,Wa,98052,United States
ben@contoso.com,Ben,Andrews,Ben Andrews,IT Manager,Information Technology,123452,123-555-1212,123-555-6642,123-555-9822,1 Microsoft way,Redmond,Wa,98052,United States
david@contoso.com,David,Longmuir,David Longmuir,IT Manager,Information Technology,123453,123-555-1213,123-555-6643,123-555-9823,1 Microsoft way,Redmond,Wa,98052,United States
cynthia@contoso.com,Cynthia,Carey,Cynthia Carey,IT Manager,Information Technology,123454,123-555-1214,123-555-6644,123-555-9824,1 Microsoft way,Redmond,Wa,98052,United States
melissa@contoso.com,Melissa,MacBeth,Melissa MacBeth,IT Manager,Information Technology,123455,123-555-1215,123-555-6645,123-555-9825,1 Microsoft way,Redmond,Wa,98052,United States
Sample Output (only last 2 lines processed, via tail -n2 -f):
"cynthia\@contoso.com", "Cynthia", "Carey", "Cynthia Carey", "IT Manager", "Information Technology", "123454", "123-555-1214", "123-555-6644", "123-555-9824", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"
"melissa\@contoso.com", "Melissa", "MacBeth", "Melissa MacBeth", "IT Manager", "Information Technology", "123455", "123-555-1215", "123-555-6645", "123-555-9825", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"
Above, to receive unquoted output simply use .put, removing the intervening call to:
.raku.match(/^^ <-["]>+ <( \" .+ \" )> <-["]>+ $$/)
Nota bene: I've tried Raku's Text::CSV module to see it will work with Raku's react/whenever block, but so far no luck. The best I can do is implement a while block, which is an okay solution if you're just feeding it tailed input. Code as follows:
~$ tail -n2 -f MS.csv | raku -MText::CSV -e 'my @rows; \
my $csv = Text::CSV.new; \
while ($csv.getline($*IN)) -> $row { \
@rows.push: $row; say @rows[*-1].raku; };'
Sample Output:
$["cynthia\@contoso.com", "Cynthia", "Carey", "Cynthia Carey", "IT Manager", "Information Technology", "123454", "123-555-1214", "123-555-6644", "123-555-9824", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
$["melissa\@contoso.com", "Melissa", "MacBeth", "Melissa MacBeth", "IT Manager", "Information Technology", "123455", "123-555-1215", "123-555-6645", "123-555-9825", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
[Above, remove the call to .raku to get unquoted output, or append a call to .match(/^^ <-["]>+ <( \" .+ \" )> <-["]>+ $$/) to retain double quotes, dropping extraneous characters at the start/end of the line].
The code above pushes incoming data onto the @rows array, in case you want to do something with it. Most importantly because Text::CSV is a true CSV parser, you can validate CSV input. And since the input is validated-CSV, you can directly output columns, or number of elements-per-row, etc. For example, replace the last statement say @rows[*-1] with say @rows[*-1][2] to receive a continuous output of the third column.
See the URLs below for sep-char, escape-char, formula-handling, binary, strict settings, etc.
https://raku.land/github:Tux/Text::CSV
https://github.com/Tux/CSV
https://raku.org
csvtool catreads the whole input, and I don't understand why. I've added a python two-liner to my answer. – Gilles 'SO- stop being evil' Oct 25 '22 at 21:52