0

I want to test if a script argument is only composed of letters. here is the script :

 BEGIN {
        VALUE=ARGV[1];
        if (VALUE ~ /[A-Za-z]/) {
           print VALUE " : Ok only letters";
        }
        print VALUE;
}

it seems that it matches every string with at least one letter :

tchupy@rasp:~$ awk -f s.awk file 111
value = 111
tchupy@rasp:~$ awk -f s.awk file  @@@
value = @@@
tchupy@rasp:~$ awk -f s.awk file  aaa
aaa : Ok only letters
value = aaa
tchupy@rasp:~$ awk -f s.awk file  1a1
1a1 : Ok only letters
value = 1a1
tchupy@rasp:~$ awk -f s.awk file  a1a
a1a : Ok only letters
value = a1a
tchupy@rasp:~$ awk -f s.awk file  1@1
value = 1@1

I tried to use the match() function, but I've got a syntax error at or near [ when I try to use [A-Za-z] regex.

Thx

Tchupy
  • 13
  • 1
    If you say you want to test a script argument, are you only using awk to test the content of a shell variable? Can you identify your shell, as there may be other possibilities. Apart from that, please add the failing awk program and the exact command line you used to call it. – AdminBee Mar 12 '21 at 10:26
  • Try replacing the regular expression from one that matches a letter anywhere in the variable to one that matches letters throughout, e.g. /^[A-Za-z]+$/ ? Or, for greater readability, /^[[:alpha:]]+$/ – steve Mar 12 '21 at 10:41
  • 1
    Note that :alpha covers Unicode, e.g. matches Ð. – steve Mar 12 '21 at 10:50

1 Answers1

5

Your test will be true if the variable contains at least one character from your character class. To test if the variable only contains characters in your character class, you need to match from the beginning (^) to the end of the script:

BEGIN {
    VALUE=ARGV[1];
    if (VALUE ~ /^[A-Za-z]+$/) {
       print VALUE " : Ok only letters";
    }
    print VALUE;
}

Or more concisely:

BEGIN {
    print ARGV[1] ": " (ARGV[1] ~ /^[A-Za-z]+$/ ? "OK" : "BAD")
}
Stephen Kitt
  • 434,908
terdon
  • 242,166
  • 5
    Or an inverted test with [^A-Za-z] to test for a single non-letter. – Kusalananda Mar 12 '21 at 11:10
  • perfect, I tried with ^ & $ but I don't know how I missed it.. your 2nd form is nice ! Thank you – Tchupy Mar 12 '21 at 11:38
  • 2
    @Kusalananda, with gawk at least, bytes that don't form part of valid characters will match neither [A-Za-z] nor [^A-Za-z], compare awk 'BEGIN{if (ARGV[1] ~ /^[A-Za-z]+$/) print "ok"}' $'\200\200' with awk 'BEGIN{if (ARGV[1] !~ /[^A-Za-z]/) print "ok"}' $'\200\200' in a UTF-8 locale for instance. (there's also the special case of the empty string) – Stéphane Chazelas Mar 12 '21 at 13:11
  • 2
    The list of characters matched by [A-Za-z] will also vary greatly with the locale, OS, awk implementation and version thereof. – Stéphane Chazelas Mar 12 '21 at 13:13
  • @Tchupy: Notwithstanding that you have accepted an answer, using a Bash test will save loading and executing a 600KB process for the sake of checking a few bytes. [[ $1 =~ ^[[:alpha:]]+$ ]] && echo Yes || echo No. – Paul_Pedant Mar 12 '21 at 13:25