5

Is there a difference between a space and a blank?

Are tabs, spaces, blanks considered characters?

4 Answers4

17

Since your tag indicates "Regular expression", I assume you are referring to the POSIX character classes [:blank:] and [:space:].

This overview table shows that [:blank:] is a subset of [:space:]:

  • [:space:] contains everything usually designated as "whitespace characters", i.e. "space" (the character \x20, generated when pressing the "space" bar), horizontal tab, vertical tab, formfeed etc.
  • [:blank:] contains only those characters which produce "empty space" within the same line, i.e. "space" and horizontal tab \t.(*)

And yes, in the context of computer input, all these are characters and should therefore also be thought of as characters when designing a regular expression.

Update Here is a similar discussion.

(*) Note: as pointed out by Stéphane Chazelas, there are BSD-based implementations where [:blank:] can also contain vertical tabulation and formfeed, see e.g. here.

AdminBee
  • 22,803
16

There is no such thing as "blank", in this context. All you have are characters, and some characters that don't actually print anything visible to you in normal text. However, everything is expressed in terms of characters, yes. There are quite a few non-printing characters in ASCII, you can find a full list here: https://web.itu.edu.tr/sgunduz/courses/mikroisl/ascii.html. The ones you are likely to encounter in text files are the various whitespace characters which are:

  • Space:
  • Tab: \t
  • Newline: \n
  • Carriage return: \r

And, less commonly:

  • Bell: \a
  • Backspace: \b
  • Vertical tab: \v
  • Form feed: \f

You also have the NULL (\0) which is non-printing but doesn't appear in text files, as well as the special escape (\e or ^[) and Control-Z (^Z) characters but, again, not really found in text files.

Relevant links


So, a "blank" can be a space or a tab or another whitespace character. Or, if you are working with Unicode and not ASCII, you have various other weird things as well. But no matter what you have, they will be characters. When you see whitespace in text, the computer sees some character. A "blank" is never the absence of a character, it is always the presence of a non-printing character.

terdon
  • 242,166
  • 1
    Of course, this Stack Exchange site being Unix and Linux everyone here should have some descendant of or equivalent for the original man ascii user manual page. (-: – JdeBP Dec 20 '19 at 14:30
  • 1
    @JdeBP Those that have an ascii command installed might need to specify the section to the manpage that contains the ASCII table: man 7 ascii. – JoL Dec 21 '19 at 00:20
1

https://en.wikipedia.org/wiki/Whitespace_character

In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography

I guess you are referring to command line usage:

In commands processed by command processors, e.g., in scripts and typed in, the space character can cause problems as it has two possible functions: as part of a command or parameter, or as a parameter or name separator. Ambiguity can be prevented either by prohibiting embedded spaces, or by enclosing a name with embedded spaces between quote characters.

Space as parameter delimiter:

command arg1 arg2

Space as part of a string (single parameter to command):

command "arg with spaces"
Christo
  • 109
  • 3
  • 1
    There is only one space though, it's the character printed by printf '\x20' and that's the same thing no matter whether it is part of a string or not. – terdon Dec 20 '19 at 13:23
  • The apparent distinction between blank and space probably goes back 40 years, to paper tape. Space is an explicit ASCII code. Blank (as in 'blank tape') was a synonym for 'RunOut'. It was normal to separate lines of data with an ASCII newline, and then leave some virgin tape too (useful for splicing in to replace damage). Blank tape is also an ASCII character (no holes == no bits == 0x00 == ASCII NUL). However, most papertape readers ran in an ignore-nulls mode, so these were not read into memory. Similarly, NULs can be used in many RS232 and terminal protocols for timing reasons. – Paul_Pedant Dec 20 '19 at 15:12
  • The answer and comments are true, but not particularly relevant to regular expressions. – WGroleau Dec 21 '19 at 17:44
  • True, but the original question was not about regexes but about the difference between blanks and spaces and whether blanks are characters. @Paul_Pedant's comment is actually the best answer to that question. – Christo Jan 14 '20 at 14:20
1

YES, If you emphasize on blank, then it is NULL otherwise blank and space are the same. Moreover blank, space and tab all are the char defined by ASCII or Unicode system. blank[0x00], space[0x20], tab[0x09]

zen29d
  • 11
  • Confusingly, the NUL ascii character (\x00, eight zero bits) derives from paper-tape, where it meant a row of tape with no holes punched out. Inches of this "runout" were used to allow insertion into a reader mechanism, or between blocks to allow for splicing to repair damage. Most readers had modes that suppressed NUL. Cards, on the other hand, had a fixed format, and unpunched columns were actually SPACE (except in Florida elections). – Paul_Pedant Jan 14 '20 at 17:24