2

I am using ksh on AIX and I want to check if a variable, for example var1=sanySAN, var2=SANYsa%$3 is alphanumeric or not.

Here, var1 is alphanumeric and var2 is not. I know I can use [a-z][A-Z][0-9] or [:alnum:] but I am not sure how.

Should I check like this?

if [[ var == [:alnum:]* ]] 
then 
    echo "yes"
else 
    echo "no"
fi 

I've tried many ways but they failed.

terdon
  • 242,166
santhosh
  • 121

4 Answers4

2

POSIXLY:

is_alnum() {
  case $1 in (*[![:alnum:]]*|"") false;; esac
}

Then:

$ is_alnum 123 && echo yes
yes
$ is_alnum % || echo no
no

mksh is the only shell failed with above approach.


Also note that if variable contains byte sequences that don't form valid characters, this approach won't work.

yash only works with valid unicode characters, so it's the only one reports error:

$ is_alnum $'A\xe9B'
yash: cannot convert the argument `A�B' into a wide character stringyash: the argument is replaced with an empty string

Updated

mksh added character classes in R56, with a bugfix in R56c.

cuonglm
  • 153,898
  • my code has restrictions and i am looking for a simple solution like this.. var="SANDY" if [[ $var == [:alnum:]* ]] ; then echo "yes" else echo "no" fi but it is not working.. Whats wrong?? can you please modify it to correct code? – santhosh Nov 16 '16 at 10:58
  • @santhosh: So just use if is_alnum "$var"; then echo yes; else echo no; fi – cuonglm Nov 16 '16 at 12:19
  • Note that it would return true for the empty string. – Stéphane Chazelas Nov 16 '16 at 15:44
  • See also the ksh93 bug mentioned in my answer which would also affect yours. – Stéphane Chazelas Nov 16 '16 at 16:00
  • POSIX doesn't guarantee it to work if the variable may contain byte sequences that don't form valid characters. – Stéphane Chazelas Nov 16 '16 at 16:07
  • @StéphaneChazelas: Thanks, fix for empty string. Good point about ksh93. Also do you have any information about mksh? – cuonglm Nov 16 '16 at 16:07
  • 2
    @cuonglm mksh developer here, just ask me ☺mksh added character classes in R56, with a bugfix in R56c, so, over three years and two weeks ago now. POSuX character classes are standards-compliant for the C (POSIX) locale. (Adding categorisation for Unicode would blow up the on-disc size of the shell by easily 30% or so, and that’s out of scope.) – mirabilos Feb 01 '21 at 19:16
  • @mirabilos Thanks for the information. – cuonglm Feb 03 '21 at 07:07
1

Some idea based on expr:

if expr "x$string" : '.*[^[:alnum:]]' >/dev/null;
then
  printf "%s is NOT alphanumeric\n" "$string"
else
  printf "%s is alphanumeric\n" "$string"
fi

Note that the use of printf over echo is intentional, since "$string" is an arbitrary string. More info here. Also the "x" at the beginning prevents expr from choking if $string expands to something that starts with a -. My thanks to Stéphane and Sato who helped refine this answer with their comments.

1

You can do:

[[ $var = +([[:alnum:]]) ]]

That would work in the AT&T ksh and zsh implementations of ksh, but apparently not in pdksh-based ones. That works in zsh -o kshglob (like when zsh is invoked as ksh) or bash -O extglob as well.

+(...) is ksh wildcard for one or more. [[:alnum:]] is any character considered alphanumeric in the current locale (in any alphabet, not necessarily only the latin alphabet).

If you want to limit to the English letters and digits, assuming the LC_ALL variable is not set, you could do:

LC_COLLATE=C; [[ $var = +([a-zA-Z0-9]) ]]

If not:

[[ $var = +([abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789]) ]]

This:

LC_ALL=C; [[ $var = +([[:alnum:]]) ]]

Should also work even though it changes the meaning of characters. Because characters that would contain bytes that otherwise correspond to ASCII alnums (like for instance £ in GB18030 which is encoded as 81 30 84 35 where 30 also happens to be ASCII 0) would also contain bytes that are not in ASCII (like 81 84 for £), and all charsets on a given system have to agree on the encoding of the characters in the portable charset which includes all the English alphanumerics.

Also note that in UTF-8 locales, ksh93u+ (at least) currently has a bug in that if $var contains sequences of bytes that don't form valid characters, but those bytes correspond to alnums in the ISO-8859-1 character set, then they would be considered as alnums. For instance $'A\xe9B' would be considered as an alphanumeric because 0xe9 is é in ISO-8859-1. (U+00E9 is é, but the UTF-8 encoding of é is 0xc3 0xa9, not 0xe9).

0

Thanks for all the help.. after many attempts i got this solution working.

var=`echo "some-value" | tr -d "[:alnum:]"`
if [ "$var" == "" ]; then
echo " string has only alphanumerics"
else
echo "something other than alphanumerals  is there"
fi
santhosh
  • 121
  • 1
    You can't use echo. That would fail on things like \0141 or -n depending on the echo variant you have. It would also fail to detect trailing newline characters. I don't know about the AIX version, but note that the GNU version works only with single-byte characters. – Stéphane Chazelas Nov 16 '16 at 16:03