Using Raku (formerly known as Perl_6)
The following two code examples work: the first example uses Raku's subst()
substitute command, while the second example uses split()
/join()
. Below, the code can be changed to return Nil
(how Raku internally represents missing values), or <NA>
or ␀
, as the user sees fit:
perl6 -e '.put for lines.map: *.subst(:global, / <?after ^ || \t > \t /, "NULL\t").subst(:global, / \t $ /, "\tNULL");'
OR
perl6 -e '.put for lines.map: *.split(/ <?after ^ || \t > \t /).join("NULL\t").split(/ \t $ /).join("\tNULL");'
[Note below how each line has two \t
representing the whitespace between three columns, and how (when a single field is blank), only two strings remain].
Sample Input (whitespace visualized with raku -ne '.raku.put;'
):
"Column1\tColumn2\tColumn3"
"string1\tdecs1\t1234"
"\tdesc1\t1255"
"string3\t\t443"
"string4\tdesc1\t1"
"string5\t\t435"
"string6\t436\t"
Sample Output (whitespace visualized with raku -ne '.raku.put;'
):
"Column1\tColumn2\tColumn3"
"string1\tdecs1\t1234"
"NULL\tdesc1\t1255"
"string3\tNULL\t443"
"string4\tdesc1\t1"
"string5\tNULL\t435"
"string6\t436\tNULL"
Alternatively, a quick-and-dirty way to get a cognate result is to split
on tabs, then go through checking each field to see if .chars
is 0 (i.e. False), or checking for empty-string directly with when "" {…}
. These solution are fine if you want to concatenate all lines into a single line, however they all are not-quite-right for the TSV problem as they leave a trailing \t
tab at the end of each line. However a second run using raku -pe 's/ \t $//'
easily fixes it:
raku -ne 'for .split("\t") { if .chars {"$_\t".print} else {"NULL\t".print}; }; "\n".print;'
#OR
raku -ne 'for .split("\t") { .chars ?? "$_\t".print !! "NULL\t".print }; "\n".print;'
#OR
raku -ne 'for .split("\t") {when "" { "NULL\t".print }; default {"$_\t".print};}; "\n".print;'
Really, it's no secret that the most robust way to do this is with a CSV parser. Raku's Text::CSV
module is written by the same developer who wrote Perl5's Text::CSV
module, and thus should be reliably able to handle empty-fields, embedded-newlines, etc. One caveat is the ability to output user-defined strings for blank values is not yet implemented, however adding the parameter quote-empty => True
will return ""
where blank fields are found. Below, remove the final output call to sep => "\t"
to return the default CSV
:
~$ raku -MText::CSV -e 'csv(in => $*IN, sep => "\t") andthen csv(in => $_, out => $*OUT, quote-empty => True, sep => "\t");' < file.txt
Column1 Column2 Column3
string1 decs1 1234
"" desc1 1255
string3 "" 443
string4 desc1 1
string5 "" 435
string6 436 ""
Finally, if you're okay with blank fields being represented by ""
but using an external module is problematic, you might be able to adapt Raku's internal representation for your own use (via a call to .raku
). Below ranks among the simplest coding solution posted (with the caveats mentioned):
~$ raku -ne '.split("\t").list.raku.put;' file.txt
("Column1", "Column2", "Column3")
("string1", "decs1", "1234")
("", "desc1", "1255")
("string3", "", "443")
("string4", "desc1", "1")
("string5", "", "435")
("string6", "436", "")
See the U&L URL below for more Raku solutions.
https://unix.stackexchange.com/a/654184/227738
https://raku.org