2

I created an environment variable:

WD=`pwd`

How can I check if it contains spaces or non-English letters?

Mat
  • 52,586
myWallJSON
  • 1,141
  • Can you clarify how you define English letters ? (i.e are digits acceptable, punctuation, any ASCII, ...) as your question has triggered various differing interpretations. – jlliagre Nov 24 '11 at 22:28

5 Answers5

3

I presume that by “non-English letters” you mean letters other than the 26 unadorned letters of the Latin alphabet. Then, strictly speaking, here's a test that meets your requirements:

if tmp=${WD//[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]/};
   [[ $tmp = *[[:alpha:] ]* ]]; then
  # $WD contains letters other than A-Z and a-z or a space

That is, strip the English letters and see if there are any letters or spaces left.

I suspect that you're in fact trying to avoid all non-ASCII characters and all whitespace, including the ones that aren't letters such as ¿ or £ or ٣. You can do that by matching the characters that are not ! through ~ (i.e. the ASCII characters other than whitespace):

if (LC_ALL=C; [[ $WD = *[^!-~]* ]]) then …

Note that ranges like !-~ or A-Z don't always do what you'd expect when you have LC_COLLATE set. Hence we set LC_ALL to a known value (LC_ALL trumps all locale settings).

If you're checking for “unusual” characters in files (why else exclude even spaces, which are allowed on most modern platforms), it might make sense to have a more restricted lists that doesn't allow any nonportable characters. POSIX only allows ASCII letters, digits and -._.

if (LC_ALL=C; [[ $WD = *[^-._0-9A-Za-z]* ]]) then …
  • That first test would be a lot simpler as [[ -n "${WD//[a-zA-Z ]}" ]] && echo "I have special characters" – phemmer Nov 23 '11 at 03:13
  • @Patrick. The first test is like that otherwise it would be subject to possible problems. Gilles' link, and further links on that page, explain it.

    The bottom line is that a range is not necessarily what you think the range is.

    The English a and z are just two chars to the computer, and the range between them is not necessarily an immutable contiguous 26 letter alphabet.

    Yes, they are contiguous as an ASCII or UNICODE range, but in a regex range statement, the range is based on the collating sequence

    – Peter.O Nov 23 '11 at 07:07
  • @Patrick That wouldn't work with many LC_COLLATE settings. You can kill LC_COLLATE while retaining LC_CTYPE, taking care of LC_ALL and LANGUAGE, but it's a lot more complicated than just listing the exact set of characters you want. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:27
1

Regular expressions and grep is what are you looking for.

We match any non-English letter or digit or / (because it's a part of every path).

if [[ -n "$( pwd | grep -o -P "([^a-zA-Z0-9\/])*" )" ]]; then 
    echo "error"
fi

sed could be usable in that case too.

If may replace all correct symbols in ${WD} with '' and look if something is left. If resulting string have non-zero length - ${WD} is not correct.

So, if we are expecting only /, numbers and English letters.

if [[ -n "$( pwd | sed -r -e 's/([a-zA-Z0-9\/])*//g' )" ]]; then 
    echo "error"
fi
0

tr is slightly simpler than grep or sed in that case:

if [[ -n "$(echo $WD|tr -d '[:alnum:]/')" ]];then
  echo "gotcha"
fi
jlliagre
  • 61,204
0

It seems pattern [a-z] is case-insensitive, so just as simple as:

[ -z "${PWD//[a-z\/]}" ] || echo "Bad chars in path: ${PWD//[a-z\/]}"
Lenik
  • 563
  • 1
  • 10
  • 20
-1

Bash can do its own pattern matching.

if [[ ${WD} = *[^[:alnum:]/]* ]]; then
  echo 'Baaaad.'
fi
ephemient
  • 15,880