-1

I was initially building on that question and answer (Using awk to sum the values of a column, based on the values of another column) but apparently I have to create a new question.

I have a tsv files with multiple columns. When the second column is empty, I would like to sum the corresponding values of the column 3. And when the column 2 is not empty (it has changing content that vary from a line to another) I would like to sum the corresponding values of the column 3. Example:

ColumnA ColumnB ColumnC
One              10
Little  STH      15
Red     XIV      55
Car              20

Output would be just: 30 and 70 Is it something possible? Can we tell: count when it's empty and count when it's not?

Thanks a lot for your help, Ben

Edit: I made up another table to replace the one I put as an image before: enter image description here

Ben
  • 1
  • 1
    Yes it's possible but please add some properly-formatted text sample - do not use screenshots. – Kate Feb 24 '20 at 08:57
  • When I copy past the table it puts it into picture right away... – Ben Feb 24 '20 at 09:05
  • Idk why that'd be happening but you've got to figure out how to replace that picture with text as we need input we can test a potential solution using, and output we can "diff" our tool output against to see if it worked. I don't see the values in your output image anywhere in your sample input anyway though, so that's also a problem - the expected output you provide has to be the output you expect from the sample input you provide, not from some other input. Or is that image actually some different input and there's no expected output in your question? idk... please clean it up. – Ed Morton Feb 24 '20 at 14:54

2 Answers2

1

You can use awk arrays indexed by the condition to be met, and then print all values in the array, for example:

awk 'BEGIN {FS = "\t"}; NR>1 {sum[$2!=""] += $3}; END{for (i in sum) print i, sum[i]}' test.tsv 

the sum array would contain the two values, sum[false] or sum[0] for the sum of the values that don't meet the condition and sum[true] or sum[1] for the values that meet the condition.

Output, using your example:

0 30
1 70
  • The drawback with your response is that, without actually looking at the first file record, you can never know whether the first of your print-out lines (i=0) is for an empty second column value or a non-empty one. Easy enough to fix of course, but then you get ever so close to Shawn's answer... – Cbhihe Feb 24 '20 at 18:36
1

Something like

awk 'BEGIN { FS=OFS="\t" }
     NR > 1 && $2 == "" { empty += $3 }
     NR > 1 && $2 != "" { full += $3 }
     END { print "Empty", empty; print "Text", full }' input.tsv
Shawn
  • 1,173
  • 6
  • 8