1

I'm using the following code to add two new columns (15 and 16) to a tab delim txt file based on calculations from other existing columns.

Problem: new column data is shown in terminal but file is not updated with columns. when sent to another file (code ... > Sample.....2.txt) the columns are present but the delimiter is changed from tab to space.

Need: to add column 15 and 16 based on calculations on existing columns in a tab delimited file in one line of code.

file : Sample1_RVDB_sort_unique.txt

code:

awk '{$15 = ($4/$13)*100; $16 = ($4/$14)*100; print}' Sample1_RVDB_sort_unique.txt

Data

utg000001l  acc|GENBANK|MH883318.1|White    80.263  608 99  16  282 877 184245  184843  4.44e-120   438 2022    270609
jubilatious1
  • 3,195
  • 8
  • 17

3 Answers3

1

If your input file is tab-separated, you should set the input field separator (FS, or use awk's -F option) to a tab (\t), otherwise awk will use the default FS (one or more of any whitespace - see Default Field Splitting in the GNU awk documentation - but this is the behaviour of all awks, not just gawk).

If you also want the output to be tab-separated, then you need to set the output field separator (OFS) to a tab too, otherwise awk will use the default OFS (a space).

e.g.

awk -F'\t' -v OFS='\t' '{ $15 = ($4/$13)*100;
                          $16 = ($4/$14)*100;
                          print
                        }' Sample1_RVDB_sort_unique.txt
cas
  • 78,579
1

You need to tell awk what your field separator is, e.g.:

BEGIN { FS=OFS="\t" }

otherwise it assumes chains of white space for input and single blank chars for output.

If your input only has 14 fields then printing additional output fields would be more efficient than creating new $15 and $16 fields in the record (which would cause the record to be recompiled):

awk '
    BEGIN { FS=OFS="\t" }
    { print $0, ($4/$13)*100, ($4/$14)*100 }
' Sample1_RVDB_sort_unique.txt

You should also make sure $13 and/or $14 aren't zero though, e.g.:

awk '
    BEGIN { FS=OFS="\t" }
    { print $0, ($13 ? ($4/$13)*100 : "Inf"), ($14 ? ($4/$14)*100 : "Inf") }
' Sample1_RVDB_sort_unique.txt

or similar.

Ed Morton
  • 31,617
0

Using Raku (formerly known as Perl_6)

~$ raku -ne 'my @a = .words; put join "\t", @a, (@a[3]/@a[12])*100, (@a[3]/@a[13])*100;'   file

Sample Input:

utg000001l  acc|GENBANK|MH883318.1|White    80.263  608 99  16  282 877 184245  184843  4.44e-120   438 2022    270609

Sample Output:

utg000001l  acc|GENBANK|MH883318.1|White    80.263  608 99  16  282 877 184245  184843  4.44e-120   438 2022    270609  30.069238   0.2246784

Above is an answer coded in Raku, a member of the Perl-family of programming languages. The -ne commandline flags tell Raku to run code linewise over the input in a non-autoprinting manner ("n" for "non").

Input is broken on whitespace using the words routine. The code .words is short for $_.words where $_ is Raku (and Perl's) "topic variable", which has been set in this case to the input line.

Input is assigned to @a array, and the input along with additional columns are computed/output (joining on \t tabs), remembering that Perl/Raku are zero-indexed.

https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17