Case statement allow only alphabetic characters?

Question

case "$1" in
all)
  echo "$1"
  ;;
[a-z][a-z][a-z][a-z][a-z][a-z])
  echo "$1"
  ;;
*)
  printf 'Invalid: %s\n' "$3"
  exit 1
  ;;
esac

With this the only input accepted is all, and 6 characters. It won't accept 4 characters or more than 6.

What I want to do here is to only allow characters, not digits or symbols, but of unlimited length.

What is the correct syntax? Thanks

https://unix.stackexchange.com/questions/289026/regex-in-case-statement?rq=1 May be of help — Guy, Jan 19 '18 at 14:46
@Guy that does explain a lot, why bash has to be so weird? But I'm sure there must be some workaround? — Freedo, Jan 19 '18 at 14:48

Stéphane Chazelas · Answer 1 · 2023-01-08T09:55:49.403

8

digits or symbols are characters. It looks like you want either:

only alphabetical characters ([[:alpha:]])
or possibly alphabetical characters but only in the latin script (as your a-z suggests)
or possibly alphabetical character in the latin script and without diacritics.

Unless the locale is C/POSIX what [a-z] matches is more or less random in bash (on GNU systems at least).

For 1, you'd want:

die() {
  printf >&2 '%s\n' "$1"
  exit 1
}
case $string in
  ("") die "Can't be empty";;
  (*[![:alpha:]]*) die "contains non-alphabetical characters";;
  (*) echo OK
esac

That would accept all, Stéphane (Latin script), γράμμα (Greek script), письмо (Cyrillic), but not foo-bar, 123...

2 can be tricky, especially if you want to consider combining diacritics.

For 3, for it to run in any locale, you'd need to specify the characters you want:

ok=abcdefghijklmnopqstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

and in the case statement, use

(*[!$ok]*) die "contains characters not allowed";;

Or you could switch to zsh where ranges like [a-zA-Z] are based on character code points, so always only include abcdefghijklmnopqstuvwxyz or with bash 4.3 or newer use theglobasciiranges option to have the same behaviour in bash.

edited Jan 08 '23 at 09:55

answered Jan 19 '18 at 14:48

Stéphane Chazelas

544,893

@Freedo, what's a word between A-Z, do you mean a capital letter in the English alphabet between A and Z. Does that include É? – Stéphane Chazelas Jan 19 '18 at 14:52
@roaima anything between a-z, without accents or other weird stuff. Stéphane your last code, it's replacing my * case or the [a-z] stuff? A-Z i want to match too – Freedo Jan 19 '18 at 14:55
@Freedo you keep swapping between a-z and A-Z. So presumably you don't mind about the case of the letters? – Chris Davies Jan 19 '18 at 14:57
Bash also has globasciiranges, which should disable the locale-specific behaviour in [a-z]. bash -c 'shopt -s globasciiranges; [[ ä = [a-z] ]] || echo no match' prints no match on my system (LANG=en_US.UTF-8) – ilkkachu Jan 19 '18 at 14:57
1

I accepted the other answer because it works and it's "easier", but I'm sure this will help others with more advanced needs. Thanks for the effort and no offense! – Freedo Jan 19 '18 at 16:28

ilkkachu · Accepted Answer · 2018-01-19T15:30:53.800

You can do this with the standard pattern match by looking for any of the non-allowed characters, and rejecting the input if you find any. Or you can use extended globs (extglob) or regexes and explicitly make sure the whole string consists of characters that are allowed.

#/bin/bash
shopt -s extglob globasciiranges
case "$1" in *([a-zA-Z]))    echo "case ok" ;; esac
[[ "$1" = *([a-zA-Z]) ]]  && echo " [[  ok"
[[ "$1" =~ ^[a-zA-Z]*$ ]] && echo "rege ok"

globasciiranges prevents [a-z] from matching accented letters, but the regex match doesn't obey it. With the regex, you'd need to set LC_COLLATE=C to prevent matching them.

All of those allow the empty string. To prevent that, change the asterisks to plusses (* to +).

Thanks, just enabling the extglob thing and replacing my [a-z] stuff to *([a-zA-Z])) solved it. Thanks! — Freedo, Jan 19 '18 at 15:27

score 0 · Answer 3 · answered Jan 19 '18 at 18:48

If you are using bash or equivalent, include at the top of your script the line set -s extglob in order to enable extended globbing ( a form of regular expression ), and then in your case statement, set your selector to be +([[:alpha:]]), followed of course by the ) required by the case statement itself.

alpha is one of several character classes defined in the bash man pages. It encompasses all alphabetic characters, upper and lower case, of your locale.

Case statement allow only alphabetic characters?

3 Answers3

Linked