Using Raku (formerly known as Perl_6)
Raku implements "CAP" programming architecture: Concurrency, Asynchrony, and Parallelism. Many aspects of "CAP" programming are advantageous for streaming data. Raku's JSON::Stream
package can process streaming JSON data. However it is unclear if a true CSV
parser written in Raku can take advantage of this architecture (yet).
If all you want to do is split lines (rows) on commas, the following code works. It implements a react
/whenever
block in Raku ("CAP" architecture). Below won't handle embedded newlines, commas embedded within doublequotes, but it's a start (also tested on /var/log/system.log
):
~$ tail -n2 -f MS.csv | raku -e 'react { \
whenever Supply( $*IN.lines ) -> $ln { \
.split(",").raku.match(/^^ <-["]>+ <( \" .+ \" )> <-["]>+ $$/).put for $ln } };'
Sample Input (from https://www.microsoft.com/en-us/download/details.aspx?id=45485):
User Name,First Name,Last Name,Display Name,Job Title,Department,Office Number,Office Phone,Mobile Phone,Fax,Address,City,State or Province,ZIP or Postal Code,Country or Region
chris@contoso.com,Chris,Green,Chris Green,IT Manager,Information Technology,123451,123-555-1211,123-555-6641,123-555-9821,1 Microsoft way,Redmond,Wa,98052,United States
ben@contoso.com,Ben,Andrews,Ben Andrews,IT Manager,Information Technology,123452,123-555-1212,123-555-6642,123-555-9822,1 Microsoft way,Redmond,Wa,98052,United States
david@contoso.com,David,Longmuir,David Longmuir,IT Manager,Information Technology,123453,123-555-1213,123-555-6643,123-555-9823,1 Microsoft way,Redmond,Wa,98052,United States
cynthia@contoso.com,Cynthia,Carey,Cynthia Carey,IT Manager,Information Technology,123454,123-555-1214,123-555-6644,123-555-9824,1 Microsoft way,Redmond,Wa,98052,United States
melissa@contoso.com,Melissa,MacBeth,Melissa MacBeth,IT Manager,Information Technology,123455,123-555-1215,123-555-6645,123-555-9825,1 Microsoft way,Redmond,Wa,98052,United States
Sample Output (only last 2 lines processed, via tail -n2 -f
):
"cynthia\@contoso.com", "Cynthia", "Carey", "Cynthia Carey", "IT Manager", "Information Technology", "123454", "123-555-1214", "123-555-6644", "123-555-9824", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"
"melissa\@contoso.com", "Melissa", "MacBeth", "Melissa MacBeth", "IT Manager", "Information Technology", "123455", "123-555-1215", "123-555-6645", "123-555-9825", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"
Above, to receive unquoted output simply use .put
, removing the intervening call to:
.raku.match(/^^ <-["]>+ <( \" .+ \" )> <-["]>+ $$/)
Nota bene: I've tried Raku's Text::CSV
module to see it will work with Raku's react
/whenever
block, but so far no luck. The best I can do is implement a while
block, which is an okay solution if you're just feeding it tail
ed input. Code as follows:
~$ tail -n2 -f MS.csv | raku -MText::CSV -e 'my @rows; \
my $csv = Text::CSV.new; \
while ($csv.getline($*IN)) -> $row { \
@rows.push: $row; say @rows[*-1].raku; };'
Sample Output:
$["cynthia\@contoso.com", "Cynthia", "Carey", "Cynthia Carey", "IT Manager", "Information Technology", "123454", "123-555-1214", "123-555-6644", "123-555-9824", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
$["melissa\@contoso.com", "Melissa", "MacBeth", "Melissa MacBeth", "IT Manager", "Information Technology", "123455", "123-555-1215", "123-555-6645", "123-555-9825", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
[Above, remove the call to .raku
to get unquoted output, or append a call to .match(/^^ <-["]>+ <( \" .+ \" )> <-["]>+ $$/)
to retain double quotes, dropping extraneous characters at the start/end of the line].
The code above pushes incoming data onto the @rows
array, in case you want to do something with it. Most importantly because Text::CSV
is a true CSV parser, you can validate CSV input. And since the input is validated-CSV, you can directly output columns, or number of elements-per-row, etc. For example, replace the last statement say @rows[*-1]
with say @rows[*-1][2]
to receive a continuous output of the third column.
See the URLs below for sep-char
, escape-char
, formula-handling
, binary
, strict
settings, etc.
https://raku.land/github:Tux/Text::CSV
https://github.com/Tux/CSV
https://raku.org
csvtool cat
reads the whole input, and I don't understand why. I've added a python two-liner to my answer. – Gilles 'SO- stop being evil' Oct 25 '22 at 21:52