1

When I run cat on a file that isn't just text it returns a large amount of characters (many of which look like this ���). What is this data?

  • If you want to see the text parts of the data in a binary file try the command strings. – Janis Mar 10 '15 at 23:54

2 Answers2

3

Much of the data in a non textual file can not be represented using characters from any of the available character sets. When this data is processed by cat and shown on the screen it is displayed as ��� or other nonsensical characters as there is no other way to display them.

Dylan
  • 1,038
2

Actually � is not a "nonsensical character". That is the Unicode replacement character. It is displayed in a terminal using UTF-8 encoding when attempting to display a byte which is not a legal UTF-8 code. It may be displayed (but far less likely) when the fonts available for the terminal do not provide a particular legal Unicode value, but it's more likely in that case that a blank is shown.

Normally, codes 32-126 (US-ASCII, the POSIX portable character set) are printable. Codes 160-255 are printable in ISO-8859-1 encoding, but not as UTF-8 because they would be one of two-or-more bytes making up a UTF-8 encoded Unicode value. Likewise, codes 128-159 are control characters in ISO-8859-1 (and non-printing), but in UTF-8, those are one of the two-or-more bytes, etc.

If you cat a non-text file, it is likely to have bytes from the 128-255 range, and those will be unlikely to form legal UTF-8 values. So you'll see �.

Further reading:

Thomas Dickey
  • 76,765