How to use command-line argument as awk regex matching expression?

Question

I have the following awk script:

#!/bin/awk -f

BEGIN {
    FS  = "";
}

value ~ "MYVALUE" # silly test
{
    print "1 - " substr($0, 235, 12);
}

$235 ~ "M" {
    print "2 - " substr($0, 235, 12);
}

{
    if(value == substr($0, 235, 12))
    {
        print "3 - " substr($0, 235, 12);
    }
    if(match(value,substr($0, 235, 12)))
    {
        print "4 - " substr($0, 235, 12);
    }
}

END {
    print "exit"
}

I run it as: ./script.awk -v value="MYVALUE" my_file

This is my RHEL 5.5's awk:

$ ls -l  $(which awk)
lrwxrwxrwx 1 root root 4 Jul 10  2015 /bin/awk -> gawk
$ gawk --version
GNU Awk 3.1.5

1 and 2 work. As an aside, if, in 2, i put the { in a newline, like:

$235 ~ "M" 
{
    print "2 - " substr($0, 235, 12);
}

then the output is the full matched line, not just the print.

What i would like to do is use value to match a regex, but it always fails. Something like:

$235...$247 ~ value

I saw examples([1831722][unix/27410]) of matching a single character, but not an expression.

EDIT

For clarity, i want to match lines that have no field separator, using a command-line parameter passed to awk and using it against a multi-character offset of the line. I hacked some python:

#!/usr/bin/python

import re

t   = 'ABC'
rg  = '^.{235,235}' + t
rgx = re.compile(rg)
tt  = '00000ABC00'
if(rgx.match(tt)):
    print "OK"
else:
    print "KO"

Only for this use-case awk would probably be faster since the files in question are quite big.

so youre doing value ~ "MYVALUE" , but then on command-line you show -v value="MYVALUE" . Please clarify this part. You are effectively checking same thing against itself, but my understanding is that you want to specify a field number like $1. Is there an example of actual input, output, and actual variable that you can provide ? — Sergiy Kolodyazhnyy, Feb 24 '17 at 14:12
@Serg true, value ~ "MYVALUE" is wrong, i was just testing (a stupid test). One field is not enough because this is a text file with long lines where the fields are defined by offsets, there is no separator character. So i want to compare either with substr on $0 or with a range like ($5,$6,...$,15) ~ parameterValue. — vesperto, Feb 24 '17 at 16:23

score 2 · Answer 1 · answered Jan 17 '18 at 17:50

I know this is a bit old but I thought I'd add a few comments if anyone else ends up here. Firstly, to create a range of fields, you can seperate them with ,, so

$235, $247 ~ value { ... action here ... }

The output of the substr function can also be used directly to try and find a match if wanted:

substr($0, 235, 12) ~ value { ... action here ... }

Also, you seem to have found that placement of some of the braces are important. With each match, action pair, either the match, or the action can be an implicit default (match all, or print $0) so the change of

$235 ~ "M" {  print "2 - " substr($0, 235, 12);  }

to

$235 ~ "M" 
{  print "2 - " substr($0, 235, 12);  }

changes the meaning from print this substring only when field 235 is an M, to whenever field 235 is an M print the whole record, AND for every record, print the substring. so this could be used if, for example, you needed to perform several checks against the substring for each record, your first action could just be:

BEGIN { FS="" }
# oursubstr will be updated first for each record.
{ oursubstr = substr($0, 235, 12) } 
oursubstr ~ value { ... action ... }
...

vesperto · Answer 2 · 2017-02-24T17:07:52.110

1

This seems to work.

{
    if(substr($0, 235, 12) ~ value)
    {
        print "4 - " substr($0, 235, 12)
        next
    }
    else
    {
        print "4 - NOK"
        next
    }
}

edited Feb 24 '17 at 17:07

answered Feb 24 '17 at 13:52

vesperto

152

But it's a bit clumsy... – vesperto Mar 01 '17 at 09:05

How to use command-line argument as awk regex matching expression?

2 Answers2