0

hello i have a awk expression which calculates the count of records having length issue. Issue is i am not getting the count as zero when there is no error

code

err_count=$(
    awk -v m=1 -v p=5 -v count=0 '
        {
            c=substr($0,m,p)
            sub(" +$", "", c)
            if ( (length(c) > p) && (NR > 1) ) {
                printf "%s:%s:%s\n", FILENAME, FNR, $0 > "/dev/stderr"
                count++
            }
        }
        END {
            print count
        }
    ' /test/data/poc/BNC_fixedwidth.txt
)

input file(fixed width)

header
10027  20033t  abc@gmail.com   19519  11/18/2021 12:06:10.260 PM BNC HardB 5 User Unk 125

the variable error_count is always giving me blank instead of zero. Can anybody through some lights

Ed Morton
  • 31,617
  • 1
    Are you sure that's actually the code you are using? it appears to have unbalanced parentheses in the if. Aside from that, why would the length of substr($0,m,p) ever be greater than p? – steeldriver May 18 '21 at 23:36
  • 1
    It's a bit unclear what you are trying to do. What defines a "length issue"? It would also likely be better to format the code a bit. (Besides steeldriver's) comment here is an example with some extra candy: https://termbin.com/z2o8n – ibuprofen May 19 '21 at 00:03
  • If you use vim, you can use this for syntax highlighting in for awk embedded in sh. https://termbin.com/rir2 (Perhaps better alternatives, but works fine for my usage at least) – ibuprofen May 19 '21 at 00:04
  • 2
    You keep posting really messy, hard to read code in your questions. Please indent and wrap your code sensibly in your questions to make it as easy as possible for us to read/understand. Run gawk -o- 'script' on your awk scripts before posting if you don;t know how to format them reasonably. – Ed Morton May 19 '21 at 11:46
  • @steeldriver ,@ibuprofen @ Edmorton . Thank you for the comments i tried to format it . – daturm girl May 19 '21 at 13:29
  • @steel driver thank you for the observation . i just noticed the catch yes the column length validation for the fixed width file doesnt make sense . Thank you i am removing the functionaility from my code – daturm girl May 19 '21 at 13:30
  • @daturmgirl it was better but there was still a lot of apparently random indentation that didn't follow/show the flow of your code as wll as a bunch of unnecessary semi-colons and escapes at the end of lines so I formatted it for you to show a sensible layout. Hope you don't mind and hopefully you can follow that for for your future coding. – Ed Morton May 19 '21 at 14:38
  • With respect to the variable error_count is always giving me blank instead of zero - I can now see that you have no variable named error_count in your code and if you meant the awk variable named count instead it's impossible for that to print as blank since you set it to 0 with -v count=0 on the command line and only ever increment it in your code, and if you meant the shell variable err_count instead, it's also impossible for that to be null since its set to the value printed by the awk command which will always be numeric unless the awk command fails to open the input or similar – Ed Morton May 19 '21 at 14:40
  • My best guess is you're doing echo "$error_count" after your code runs but you have no such variable and actually meant to do echo "$err_count" instead. – Ed Morton May 19 '21 at 14:45
  • 2
    What your code is trying to detect is an impossible condition by the way. c=substr($0,m,p) is creating a string c of length p, then sub(" +$", "", c) is removing any spaces from c, and then length(c) > p is testing if the resulting c is longer than length p. It simply cannot be. c MUST be length p or less as it starts out as length p and then you may remove chars from it but you never add chars to it. Start with 10 apples and then remove 0 to 10 of them and then see if you now have 11 or more apples. – Ed Morton May 19 '21 at 14:50
  • Thanks @ed Morton in my code I was there was spelling error as you specified hence the blank was coming in – daturm girl May 19 '21 at 18:17
  • For the logic issue yes it was absolutely column length validation is not required . Thanks for the explanation and sample formatting – daturm girl May 19 '21 at 18:19
  • You're welcome. It took longer than it should have for you to get an answer because you didn't include the code that was failing, the echo "$error_count" or similar. If you had included that I expect you'd have got an answer immediately when you asked the question. It's important to include the line where the failure occurs when you ask a question in future otherwise people are just guessing at what the problem might be. I posted my comment as an answer now. – Ed Morton May 19 '21 at 18:20

2 Answers2

1

As noted by steeldriver the width of c would never be longer then your limit:

c = substr($0, 1, 5)

Length of c would never be > 5.

Beyond that it is blank / empty because there is a syntax error in the awk script. This should be printed to shell unless you do something like 2>/dev/null

That not longer apply after latest update. But from what I can see this was not corrected by you. Just to be clear:

    if( (length(c) >  p  && NR > 1 )
#       ^
#       +--- Never closed.

Beside that your edit also reviles more. You do not need \ to continue script on next line. That is:

  • Not { \ but {

  • Not

     ... "/dev/stderr"\
        ++count
    
  • But

       ... "/dev/stderr"
    ++count
    

Using semicolons at end of statements is OK, but for the code to be more readable do not mix. Either use ; at end of all statements, or none, unless, of course, you for some reason have more then one statement in a line. So:

Not:

    printf "%s: %d", $1, $2;
    ++foo
    ++bar;
    printf "%s: %d", $3, $4

But:

    printf "%s: %d", $1, $2
    ++foo
    ++bar
    printf "%s: %d", $3, $4

Or (not widely used from what I have seen):

    printf "%s: %d", $1, $2;
    ++foo;
    ++bar;
    printf "%s: %d", $3, $4;

It is also the concept of using substr() of $0 and trimming that by sub().

The default separator of awk is <space>. This is treated differently then other character delimiters. That is: multiple blanks are concatenated into one separator. Thus both lines in:

A B C
  A    B     C

Result in:

$1 == A
$2 == B
$3 == C

As for issue at hand you could possibly do something like this:

awk \
    -v width_max=5 \
    -v field_validate=1 \
'
BEGIN {
    err_count = 0
}
$1 == "header" {
    next
}
NF < field_validate || length($field_validate) > width_max {
    printf "%s:%d:%d:%s\n", FILENAME, NF, FNR, $0 > "/dev/stderr"
    ++err_count
}
END {
    printf "%d", err_count
}

' sample

Note that you would perhaps put the NF check as a separate check. Something like:

NF != field_count {
    # NF does not match with required fields
}

Where field_count is a defined variable.

Simple example script you can look at in regards to FS, NF etc.

awk -v field_count=3 \
'
NF != field_count {
    printf "NF mismatch %d != %d\n", NF, field_count
}
{
    printf "<%s><%s><%s>\n", $1, $2, $3
}
' <<EOF
AA BB CC
AA      BB    CC
   AA   BB      CC
AA BB
AA BB CC DD
EOF
ibuprofen
  • 2,890
1

You're doing echo "$error_count" or similar after your code runs but you have no such variable and actually meant to do echo "$err_count" instead.

Ed Morton
  • 31,617