How to find out if PWD contains spaces or non-English letters?

Question

I created an environment variable:

WD=`pwd`

How can I check if it contains spaces or non-English letters?

Can you clarify how you define English letters ? (i.e are digits acceptable, punctuation, any ASCII, ...) as your question has triggered various differing interpretations. — jlliagre, Nov 24 '11 at 22:28

score 3 · Accepted Answer · edited Apr 13 '17 at 12:36

I presume that by “non-English letters” you mean letters other than the 26 unadorned letters of the Latin alphabet. Then, strictly speaking, here's a test that meets your requirements:

if tmp=${WD//[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]/};
   [[ $tmp = *[[:alpha:] ]* ]]; then
  # $WD contains letters other than A-Z and a-z or a space

That is, strip the English letters and see if there are any letters or spaces left.

I suspect that you're in fact trying to avoid all non-ASCII characters and all whitespace, including the ones that aren't letters such as ¿ or £ or ٣. You can do that by matching the characters that are not ! through ~ (i.e. the ASCII characters other than whitespace):

if (LC_ALL=C; [[ $WD = *[^!-~]* ]]) then …

Note that ranges like !-~ or A-Z don't always do what you'd expect when you have LC_COLLATE set. Hence we set LC_ALL to a known value (LC_ALL trumps all locale settings).

If you're checking for “unusual” characters in files (why else exclude even spaces, which are allowed on most modern platforms), it might make sense to have a more restricted lists that doesn't allow any nonportable characters. POSIX only allows ASCII letters, digits and -._.

if (LC_ALL=C; [[ $WD = *[^-._0-9A-Za-z]* ]]) then …

That first test would be a lot simpler as [[ -n "${WD//[a-zA-Z ]}" ]] && echo "I have special characters" — phemmer, Nov 23 '11 at 03:13
@Patrick. The first test is like that otherwise it would be subject to possible problems. Gilles' link, and further links on that page, explain it.
The bottom line is that a range is not necessarily what you think the range is.

The English a and z are just two chars to the computer, and the range between them is not necessarily an immutable contiguous 26 letter alphabet.

Yes, they are contiguous as an ASCII or UNICODE range, but in a regex range statement, the range is based on the collating sequence — Peter.O, Nov 23 '11 at 07:07
@Patrick That wouldn't work with many LC_COLLATE settings. You can kill LC_COLLATE while retaining LC_CTYPE, taking care of LC_ALL and LANGUAGE, but it's a lot more complicated than just listing the exact set of characters you want. — Gilles 'SO- stop being evil', Nov 23 '11 at 08:27

score 1 · Answer 2 · answered Nov 22 '11 at 19:50

1

Regular expressions and grep is what are you looking for.

We match any non-English letter or digit or / (because it's a part of every path).

if [[ -n "$( pwd | grep -o -P "([^a-zA-Z0-9\/])*" )" ]]; then 
    echo "error"
fi

sed could be usable in that case too.

If may replace all correct symbols in ${WD} with '' and look if something is left. If resulting string have non-zero length - ${WD} is not correct.

So, if we are expecting only /, numbers and English letters.

if [[ -n "$( pwd | sed -r -e 's/([a-zA-Z0-9\/])*//g' )" ]]; then 
    echo "error"
fi

answered Nov 22 '11 at 19:50

ДМИТРИЙ МАЛИКОВ

7,029

You'll probably want to allow . in the path too. – Kevin Nov 23 '11 at 04:49
There is nothing about it in the question. – ДМИТРИЙ МАЛИКОВ Nov 23 '11 at 07:29
1

This won't work in most LC_COLLATE settings (see my answer for explanation). Plus all punctuation and symbols were supposed to be allowed. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:32

score 0 · Answer 3 · answered Nov 22 '11 at 21:56

0

tr is slightly simpler than grep or sed in that case:

if [[ -n "$(echo $WD|tr -d '[:alnum:]/')" ]];then
  echo "gotcha"
fi

answered Nov 22 '11 at 21:56

jlliagre

61,204

This won't work in most character sets, where [:alnum:] contains non-English letters. Plus all digits, punctuation and symbols were supposed to be allowed. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:32

score 0 · Answer 4 · answered Nov 23 '11 at 03:03

0

It seems pattern [a-z] is case-insensitive, so just as simple as:

[ -z "${PWD//[a-z\/]}" ] || echo "Bad chars in path: ${PWD//[a-z\/]}"

answered Nov 23 '11 at 03:03

Lenik

563
1
10
20

This won't work in most LC_COLLATE settings (see my answer for explanation). Plus all punctuation and symbols were supposed to be allowed. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:33

score -1 · Answer 5 · answered Nov 22 '11 at 22:34

-1

Bash can do its own pattern matching.

if [[ ${WD} = *[^[:alnum:]/]* ]]; then
  echo 'Baaaad.'
fi

answered Nov 22 '11 at 22:34

ephemient

15,880

This won't work in most character sets, where [:alnum:] contains non-English letters. Plus all digits, punctuation and symbols were supposed to be allowed. – Gilles 'SO- stop being evil' Nov 23 '11 at 08:34

How to find out if PWD contains spaces or non-English letters?

5 Answers5