I created an environment variable:
WD=`pwd`
How can I check if it contains spaces or non-English letters?
I created an environment variable:
WD=`pwd`
How can I check if it contains spaces or non-English letters?
I presume that by “non-English letters” you mean letters other than the 26 unadorned letters of the Latin alphabet. Then, strictly speaking, here's a test that meets your requirements:
if tmp=${WD//[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]/};
[[ $tmp = *[[:alpha:] ]* ]]; then
# $WD contains letters other than A-Z and a-z or a space
That is, strip the English letters and see if there are any letters or spaces left.
I suspect that you're in fact trying to avoid all non-ASCII characters and all whitespace, including the ones that aren't letters such as ¿
or £
or ٣
. You can do that by matching the characters that are not !
through ~
(i.e. the ASCII characters other than whitespace):
if (LC_ALL=C; [[ $WD = *[^!-~]* ]]) then …
Note that ranges like !-~
or A-Z
don't always do what you'd expect when you have LC_COLLATE
set. Hence we set LC_ALL
to a known value (LC_ALL
trumps all locale settings).
If you're checking for “unusual” characters in files (why else exclude even spaces, which are allowed on most modern platforms), it might make sense to have a more restricted lists that doesn't allow any nonportable characters. POSIX only allows ASCII letters, digits and -._
.
if (LC_ALL=C; [[ $WD = *[^-._0-9A-Za-z]* ]]) then …
[[ -n "${WD//[a-zA-Z ]}" ]] && echo "I have special characters"
– phemmer
Nov 23 '11 at 03:13
The bottom line is that a range is not necessarily what you think the range is.
The English a
and z
are just two chars to the computer, and the range between them is not necessarily an immutable contiguous 26 letter alphabet.
Yes, they are contiguous as an ASCII or UNICODE range, but in a regex range statement, the range is based on the collating sequence
– Peter.O Nov 23 '11 at 07:07LC_COLLATE
settings. You can kill LC_COLLATE
while retaining LC_CTYPE
, taking care of LC_ALL
and LANGUAGE
, but it's a lot more complicated than just listing the exact set of characters you want.
– Gilles 'SO- stop being evil'
Nov 23 '11 at 08:27
Regular expressions and grep
is what are you looking for.
We match any non-English letter or digit or /
(because it's a part of every path).
if [[ -n "$( pwd | grep -o -P "([^a-zA-Z0-9\/])*" )" ]]; then
echo "error"
fi
sed
could be usable in that case too.
If may replace all correct symbols in ${WD}
with ''
and look if something is left. If resulting string have non-zero length - ${WD}
is not correct.
So, if we are expecting only /
, numbers and English letters.
if [[ -n "$( pwd | sed -r -e 's/([a-zA-Z0-9\/])*//g' )" ]]; then
echo "error"
fi
LC_COLLATE
settings (see my answer for explanation). Plus all punctuation and symbols were supposed to be allowed.
– Gilles 'SO- stop being evil'
Nov 23 '11 at 08:32
tr
is slightly simpler than grep
or sed
in that case:
if [[ -n "$(echo $WD|tr -d '[:alnum:]/')" ]];then
echo "gotcha"
fi
[:alnum:]
contains non-English letters. Plus all digits, punctuation and symbols were supposed to be allowed.
– Gilles 'SO- stop being evil'
Nov 23 '11 at 08:32
It seems pattern [a-z]
is case-insensitive, so just as simple as:
[ -z "${PWD//[a-z\/]}" ] || echo "Bad chars in path: ${PWD//[a-z\/]}"
LC_COLLATE
settings (see my answer for explanation). Plus all punctuation and symbols were supposed to be allowed.
– Gilles 'SO- stop being evil'
Nov 23 '11 at 08:33
Bash can do its own pattern matching.
if [[ ${WD} = *[^[:alnum:]/]* ]]; then
echo 'Baaaad.'
fi
[:alnum:]
contains non-English letters. Plus all digits, punctuation and symbols were supposed to be allowed.
– Gilles 'SO- stop being evil'
Nov 23 '11 at 08:34