awk doesnt add columns to tab delim file

Question

I'm using the following code to add two new columns (15 and 16) to a tab delim txt file based on calculations from other existing columns.

Problem: new column data is shown in terminal but file is not updated with columns. when sent to another file (code ... > Sample.....2.txt) the columns are present but the delimiter is changed from tab to space.

Need: to add column 15 and 16 based on calculations on existing columns in a tab delimited file in one line of code.

file : Sample1_RVDB_sort_unique.txt

code:

awk '{$15 = ($4/$13)*100; $16 = ($4/$14)*100; print}' Sample1_RVDB_sort_unique.txt

Data

utg000001l  acc|GENBANK|MH883318.1|White    80.263  608 99  16  282 877 184245  184843  4.44e-120   438 2022    270609

(1) But see also How to safely use gawk's -i option or @include directive? (2) Your file is what it is. But for the purpose of the question, you could have showed us a four-column file where you want to add $5 and $6. That way you could have avoided having data that’s wider than the screen. (3) I don’t know what code ... > Sample.....2.txt means. Better to show the actual command. — G-Man Says 'Reinstate Monica', Aug 23 '23 at 02:29
Hi G-man, your suggested thread was most usefull. Thanks a bunch! Ill also consider making smaller example files for posts instead of pasting my full files. Thanks to and cas for your help. — stephen ramnarine, Aug 23 '23 at 13:08
this question shouldn't have been closed, it's not about in-place editing (it's about input & output field separators), so it's not a dupe of a question about in-place editing. — cas, Aug 23 '23 at 13:34
@cas: A big part of the question is «My command gives the right output, but the file isn’t updated.», and the OP acknowledged that the Q&A about in-place editing was useful to them. The input & output field separators part is undoubtedly a duplicate of some other question. — G-Man Says 'Reinstate Monica', Aug 31 '23 at 05:24
Does this answer your question? How to change a file in-place using awk? (as with "sed -i") — AdminBee, Feb 02 '24 at 15:11

score 1 · Accepted Answer · answered Aug 23 '23 at 02:33

If your input file is tab-separated, you should set the input field separator (FS, or use awk's -F option) to a tab (\t), otherwise awk will use the default FS (one or more of any whitespace - see Default Field Splitting in the GNU awk documentation - but this is the behaviour of all awks, not just gawk).

If you also want the output to be tab-separated, then you need to set the output field separator (OFS) to a tab too, otherwise awk will use the default OFS (a space).

e.g.

awk -F'\t' -v OFS='\t' '{ $15 = ($4/$13)*100;
                          $16 = ($4/$14)*100;
                          print
                        }' Sample1_RVDB_sort_unique.txt

Hi cas, this works for the output delimeter, Thanks a bunch! — stephen ramnarine, Aug 23 '23 at 13:07

score 1 · Answer 2 · answered Aug 23 '23 at 16:35

You need to tell awk what your field separator is, e.g.:

BEGIN { FS=OFS="\t" }

otherwise it assumes chains of white space for input and single blank chars for output.

If your input only has 14 fields then printing additional output fields would be more efficient than creating new $15 and $16 fields in the record (which would cause the record to be recompiled):

awk '
    BEGIN { FS=OFS="\t" }
    { print $0, ($4/$13)*100, ($4/$14)*100 }
' Sample1_RVDB_sort_unique.txt

You should also make sure $13 and/or $14 aren't zero though, e.g.:

awk '
    BEGIN { FS=OFS="\t" }
    { print $0, ($13 ? ($4/$13)*100 : "Inf"), ($14 ? ($4/$14)*100 : "Inf") }
' Sample1_RVDB_sort_unique.txt

or similar.

score 0 · Answer 3 · answered Aug 23 '23 at 03:52

Using Raku (formerly known as Perl_6)

~$ raku -ne 'my @a = .words; put join "\t", @a, (@a[3]/@a[12])*100, (@a[3]/@a[13])*100;'   file

Sample Input:

utg000001l  acc|GENBANK|MH883318.1|White    80.263  608 99  16  282 877 184245  184843  4.44e-120   438 2022    270609

Sample Output:

utg000001l  acc|GENBANK|MH883318.1|White    80.263  608 99  16  282 877 184245  184843  4.44e-120   438 2022    270609  30.069238   0.2246784

Above is an answer coded in Raku, a member of the Perl-family of programming languages. The -ne commandline flags tell Raku to run code linewise over the input in a non-autoprinting manner ("n" for "non").

Input is broken on whitespace using the words routine. The code .words is short for $_.words where $_ is Raku (and Perl's) "topic variable", which has been set in this case to the input line.

Input is assigned to @a array, and the input along with additional columns are computed/output (joining on \t tabs), remembering that Perl/Raku are zero-indexed.

https://raku.org

awk doesnt add columns to tab delim file

3 Answers3

Linked