2

I am trying to insert text to any columns that is blank. The file is tab separated and im trying to insert text into any column that is blank. e.g.

Column1  Column2  Column3
string1  decs1    1234
         desc1    1255
string3           3443
string4  desc1    1
string5           435

I am trying to only insert the text NULL into column1 or 2 that is blank. So it would look like this. (If column 2 is empty then there is a double tab between column 1 and 3)

column1  column2  column3
string1  decs1    1234
null     desc1    1255
string3  null     3443
string4  desc1    1
string5  null     435

I've tried using awk but my test will insert the NULL text into all of column2, but if there is a blank entry there it truncates the 3rd column into column2.

awk '{sub("$", "NULL", $2)}; 1' file.txt > file2.txt

result is something like this

column1  column2 column3
string1  desc1NULL 1234
string2  desc1NULL 1255
string3  3443NULL
string4  descNULL  1
string4  435NULL

I am obviously using the incorrect syntax but not sure how to correct it.

thanks

Philippos
  • 13,453
DAP P
  • 61

5 Answers5

1

I suggest to do it with sed:

sed -E -e :1 -e 's/(^|\t)(\t|$)/\1null\2/;t1' yourfile

(For portability, use an actual TAB character instead of \t, but this would probably not survive browser copy/paste.)

How do you detect an empty field? It's either

  • a tab at the beginning of a line (^\t) or
  • two tabs with nothing in between (\t\t) or
  • a tab at the end of the line (\t$)

In all of these cases substitute your pattern between both matches.

In case there are two empty fields after another, we need to loop, so t1 jumps to the :1 mark, if something was substituted.

Philippos
  • 13,453
1
awk -F'\t' -v OFS='\t' '$1==""{ $1="NUll" }  $2==""{ $2="NUll" }1' infile

With sub("$", "NULL", $2), you are calling the sub() function to replace the End-of-String $ of column#2 $2 with "NULL" string. Also you to repalce those columns with "NULL" only if they are empty which with that way with using sub(), you can do something like:

awk -F'\t' -v OFS='\t' '
  $1==""{ sub(/.*/, "NULL", $1) }
  $2==""{ sub(/.*/, "NULL", $2) }
  $3=="" { "......" }
  # etc ...
1' infile

Or maybe:

awk -F'\t' -v OFS='\t' '
  { sub(/.*/, ($1==""?"NULL":$1), $1) }
  { sub(/.*/, ($2==""?"NULL":$2), $2) }
  # continue ...
1' infile

... but that is just somehow bad compare to the first commnad earlier; Or you might to use below variant in case replacement job is not only limited to two columns but consider it's in N columns.

awk -F'\t' -v OFS='\t' -v N=2 '{
  while(colNr++<N){
      $colNr=($colNr==""?"NULL":$colNr)
  }
  colNr=0
}1' infile
αғsнιη
  • 41,407
1

Using any awk in any shell on every Unix box:

$ awk 'BEGIN{FS=OFS="\t"} {for (i=1; i<=NF; i++) if ($i=="") $i="null"} 1' file
Column1 Column2 Column3
string1 decs1   1234
null    desc1   1255
string3 null    3443
string4 desc1   1
string5 null    435

To use sub()s would be:

$ awk '{ while(sub(/\t\t/,"\tnull\t")); sub(/^\t/,"null\t"); sub(/\t$/,"\tnull")}1' file
Column1 Column2 Column3
string1 decs1   1234
null    desc1   1255
string3 null    3443
string4 desc1   1
string5 null    435
Ed Morton
  • 31,617
0

Using Raku (formerly known as Perl_6)

The following two code examples work: the first example uses Raku's subst() substitute command, while the second example uses split()/join(). Below, the code can be changed to return Nil (how Raku internally represents missing values), or <NA> or , as the user sees fit:

perl6 -e '.put for lines.map: *.subst(:global, / <?after ^ || \t > \t /, "NULL\t").subst(:global, / \t $ /, "\tNULL");'  

OR

perl6 -e '.put for lines.map: *.split(/ <?after ^ || \t > \t /).join("NULL\t").split(/ \t $ /).join("\tNULL");' 

[Note below how each line has two \t representing the whitespace between three columns, and how (when a single field is blank), only two strings remain].

Sample Input (whitespace visualized with raku -ne '.raku.put;'):

"Column1\tColumn2\tColumn3"
"string1\tdecs1\t1234"
"\tdesc1\t1255"
"string3\t\t443"
"string4\tdesc1\t1"
"string5\t\t435"
"string6\t436\t"

Sample Output (whitespace visualized with raku -ne '.raku.put;'):

"Column1\tColumn2\tColumn3"
"string1\tdecs1\t1234"
"NULL\tdesc1\t1255"
"string3\tNULL\t443"
"string4\tdesc1\t1"
"string5\tNULL\t435"
"string6\t436\tNULL"

Alternatively, a quick-and-dirty way to get a cognate result is to split on tabs, then go through checking each field to see if .chars is 0 (i.e. False), or checking for empty-string directly with when "" {…}. These solution are fine if you want to concatenate all lines into a single line, however they all are not-quite-right for the TSV problem as they leave a trailing \t tab at the end of each line. However a second run using raku -pe 's/ \t $//' easily fixes it:

raku -ne 'for .split("\t") { if .chars {"$_\t".print} else {"NULL\t".print}; }; "\n".print;'

#OR

raku -ne 'for .split("\t") { .chars ?? "$_\t".print !! "NULL\t".print }; "\n".print;'

#OR

raku -ne 'for .split("\t") {when "" { "NULL\t".print }; default {"$_\t".print};}; "\n".print;'

Really, it's no secret that the most robust way to do this is with a CSV parser. Raku's Text::CSV module is written by the same developer who wrote Perl5's Text::CSV module, and thus should be reliably able to handle empty-fields, embedded-newlines, etc. One caveat is the ability to output user-defined strings for blank values is not yet implemented, however adding the parameter quote-empty => True will return "" where blank fields are found. Below, remove the final output call to sep => "\t" to return the default CSV:

~$ raku -MText::CSV -e 'csv(in => $*IN, sep => "\t") andthen csv(in => $_, out => $*OUT, quote-empty => True, sep => "\t");'  < file.txt
Column1 Column2 Column3
string1 decs1   1234
""  desc1   1255
string3 ""  443
string4 desc1   1
string5 ""  435
string6 436 ""

Finally, if you're okay with blank fields being represented by "" but using an external module is problematic, you might be able to adapt Raku's internal representation for your own use (via a call to .raku). Below ranks among the simplest coding solution posted (with the caveats mentioned):

~$ raku -ne '.split("\t").list.raku.put;'  file.txt
("Column1", "Column2", "Column3")
("string1", "decs1", "1234")
("", "desc1", "1255")
("string3", "", "443")
("string4", "desc1", "1")
("string5", "", "435")
("string6", "436", "")

See the U&L URL below for more Raku solutions.

https://unix.stackexchange.com/a/654184/227738
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17
0

Using the csvkit tools to create an intermediate JSON document that we modify by replacing all non-existing values with the string NULL, and then transforming that modified JSON document back to tab-delimited CSV:

csvjson -t file |
jq '.[] |= map_values(. // "NULL")' |
in2csv --blanks -f json | csvformat -T

Here, we first use csvjson -t to parse the tab-delimited input from the file called file, outputting JSON. We then invoke jq to modify the generated document, replacing all empty values with the string NULL. The in2csv call reads the JSON document and generates CSV, using --blanks to avoid replacing the NULL strings with empty values. Then, finally, reformatting the CSV data to be tab-delimited using csvformat -T.


The same thing, but allowing jq to work on a stream of individual rows instead of having to read an array of the whole data set. Then streaming the data in the same manner to in2csv.

csvjson --stream -t file |
jq -c 'map_values(. // "NULL")' |
in2csv -f ndjson --blanks | csvformat -T
Kusalananda
  • 333,661