When I run cat on a file that isn't just text it returns a large amount of characters (many of which look like this ���). What is this data?
2 Answers
Much of the data in a non textual file can not be represented using characters from any of the available character sets. When this data is processed by cat and shown on the screen it is displayed as ��� or other nonsensical characters as there is no other way to display them.

- 1,038
Actually � is not a "nonsensical character". That is the Unicode replacement character. It is displayed in a terminal using UTF-8 encoding when attempting to display a byte which is not a legal UTF-8 code. It may be displayed (but far less likely) when the fonts available for the terminal do not provide a particular legal Unicode value, but it's more likely in that case that a blank is shown.
Normally, codes 32-126 (US-ASCII, the POSIX portable character set) are printable. Codes 160-255 are printable in ISO-8859-1 encoding, but not as UTF-8 because they would be one of two-or-more bytes making up a UTF-8 encoded Unicode value. Likewise, codes 128-159 are control characters in ISO-8859-1 (and non-printing), but in UTF-8, those are one of the two-or-more bytes, etc.
If you cat
a non-text file, it is likely to have bytes from the 128-255 range, and those will be unlikely to form legal UTF-8 values. So you'll see �.
Further reading:

- 76,765
-
Not to mention what it does to your display even after the
cat
finishes. – Wildcard Jun 08 '16 at 01:32 -
OP didn't mention those (otherwise I'd have expanded the answer). – Thomas Dickey Jun 08 '16 at 01:35
strings
. – Janis Mar 10 '15 at 23:54