How can I test if a string has any ASCII white space characters in it?

Question

How can I check if a string has any whitespace characters in it? I do not have to worry about things outside of ASCII for example unicode zero width characters etc. You can assume that the string is stored in a shell variable, e.g. $string.

Example of behavior:

abc has a space so it would return true

\tabc has a tab so it would return true

abc has no whitespace chars so it should return false

abc
hello

has a line break so it should return true

A solution using any common command line utility (sed, grep, awk, bash) would be sufficient.

Stéphane Chazelas · Accepted Answer · 2023-01-26T16:12:34.103

In POSIX sh syntax:

case $string in
  (*[[:blank:]]*) echo "string contains at least one character classified as blank";;
  (*[[:space:]]*) echo "string contains at least one character classified as whitespace (but not blank)";;
  (*) echo no character classified as whitespace;;
esac

[:blank:] is required to be a subset of [:space:]. [:blank:] is guaranteed to contain at least space and TAB and [:space:] at least space, TAB, NL, CR, FF and VT.

That's according to the encoding being used and character classification in the locale. On most systems all locales use a charset that is ASCII or a superset of ASCII (if we ignore MS-Kanji found on some BSDs in some Japanese locales where 0x5c is ¥ instead of \ (and there's no \ character!) but is otherwise a superset of ASCII for the rest).

If you wanted to check that $string contains at least one ASCII-encoded ASCII-whitespace even on EBCDIC-based systems, you'd need to specify the set by byte values or use iconv to convert theme from the current charset to ASCII:

ascii_whitespace=$(printf ' \r\n\r\f\v' | iconv -t ASCII)
# or
ascii_whitespace=$(printf '\40\11\12\13\14\15')
case $string in
  (["$ascii_whitespace"]) echo contains at least one ASCII whitespace;;
esac

(hoping that \15 doesn't happen to be a newline on that system).

If I use a script file with the shebang set as #!/bin/sh and the variable set with string="abc\nxyz" it seems to incorrectly respond with no character classified as whitespace? I get the same behavior with AdminBee's answer. My full example script using this answer is here: https://gist.github.com/chrissound/762667fba754dc1a472a6386cca124f0 — Chris Stryczynski, Jan 26 '23 at 16:56
@ChrisStryczynski, that variable doesn't contain a whitespace. It contains a backslash followed by a n. Use string=$(printf 'abc\nxyz') to get a newline character in there. Or string=$'abc\nxyz' with some sh implementations (will be in the next POSIX version). Or a literal newline. — Stéphane Chazelas, Jan 26 '23 at 17:23
@ChrisStryczynski and note that [[ "$string" =~ [[:blank:]] ]] is bash syntax, not sh syntax (also supported by ksh93 ([[...]] comes from ksh) and zsh). — Stéphane Chazelas, Jan 26 '23 at 17:31

AdminBee · Answer 2 · 2023-01-27T14:09:31.220

Let's assume your string is stored in a shell variable $string. In that case, since you have indicated bash as the shell, you can use the builtin regular expression matching inside the [[ ... ]] test construct:

if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi

The same can be used inside a shell script.

Some usage examples:

~$ string=" hello "
~$ if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi
Contains whitespace
~$ string=$'\thello'
~$ if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi
Contains whitespace
~$ string="hello"
~$ if [[ "$string" =~ [[:space:]] ]]; then echo "Contains whitespace"; else echo "Doesn't contain whitespace"; fi
Doesn't contain whitespace

Note: This uses the POSIX character class [:space:]. See e.g.

for the subtleties between [:space:] and [:blank:]. If you only want to consider characters that create whitespace within the same line (i.e. <space> and \t), you should switch to [:blank:] instead (but note that in some locales, [:blank:] will also contain vertical space characters).

jubilatious1 · Answer 3 · 2023-01-29T08:35:01.147

Using Raku (formerly known as Perl6)

~$ echo "abc " | raku -ne '.contains(/ \s /).say'
True
~$ echo "abc" | raku -ne '.contains(/ \s /).say'
False

The above Raku code is run linewise over the input with the awk-like -ne command-line flags. Raku's contains method returns a boolean. The leading . dot on contains directs that input is taken off the command line, or (alternatively) stdin.

~$ echo "abc " | raku -ne 'say .contains(/ \s /) ?? True !! False;'
    True
~$ echo "abc" | raku -ne 'say .contains(/ \s /) ?? True !! False;'
    False

Above is slightly more complicated because it uses Raku's ternary operator: Test ?? True !! False . Raku has logical True and logical False, so no need to quote the above returns. The advantage here is you can simply replace True and False with double-quoted returns of your choosing, e.g. "Yes" and "No".

Presumably the OP's question pertains to horizontal whitespace, and in that regard Raku can distinguish \h horizontal whitespace from \v vertical whitespace:

~$ raku -e 'put "abc\t";' | raku -ne 'say .contains(/ \h /);'
True
~$ raku -e 'put "abc\t";' | raku -ne 'say .contains(/ \v /);'
False

The OP doesn't say whether multiline input strings must be handled. They'll always be "positive" for whitespace, but possibly "negative" for horizontal whitespace. [Think of a column of numbers as input]. Anyway, in Raku you can read input in linewise as above (which autochomps by default), or all-at-once (retaining eol \n newlines) with the oddly-named-but-memorable slurp.

Reading linewise (autochomps):

~$ raku -e 'put "1\n2\n3";' | raku -ne 'say .contains(/ \h /);'
False
False
False
~$ raku -e 'put "1\n2\n3";' | raku -e 'for lines() {say .contains(/ \h /)};'
False
False
False

Reading all-at-once (no autochomping):

~$ raku -e 'put "1\n2\n3";' | raku -e 'say slurp.contains(/ \v /);'
True
~$ raku -e 'put "1\n2\n3";' | raku -e 'put slurp.contains(/ \h /);'
False

Addendum: I'm interpreting the OP's statement, "I do not have to worry about things outside of ASCII..." as 'I don't care if Unicode is handled or not'. If only ASCII whitespace is to be handled (and all others rejected), that's something Raku can manage, but not addressed above. Note that Raku is Unicode-ready, so that \s (which is short for <space>) and \h (which is short for <blank>) as well as \v all accept Unicode by default.

If you want to reject non-ASCII (horizontal) whitespace, you could try something like the following bespoke character class: <:ASCII> & <blank>.

Examples:

~$ raku -e 'put "\xA0";' | raku -ne 'put .contains(/ <blank> / );'
True
~$ raku -e 'put "\xA0";' | raku -ne 'put .contains(/ <:ASCII> & <blank> / );'
False

https://docs.raku.org/language/operators#index-entry-operator_ternary
https://docs.raku.org/language/regexes#\h_and_\H
https://docs.raku.org/routine/contains
https://raku.org

How can I test if a string has any ASCII white space characters in it?

3 Answers3