21

I understand that EOT is ASCII code 4, whereas EOF is encoded as -1 (at least in C). Before I found out that EOF is mapped to -1, I thought it was just a synonym for EOT. Why is EOF mapped to -1 rather than EOT? As far as I can tell, they both do the same thing, which is to terminate a file stream. The only difference I can discern is that EOT also terminates a command in the bash shell. I would like a description of the precise technical differences between these two codes.

user628544
  • 1,565

3 Answers3

16

Generally, EOF isn't a character; it's the absence of a character.

If a program runs on a terminal in canonical mode with default settings (i.e. a plain C program that just uses stdio), it will never see the ASCII character EOT. The terminal driver recognizes that character and creates an EOF condition (which at the low level is a 0 return value from read()). The stdio library translates that EOF condition into the return value that is appropriate for the function in question (the EOF macro for getchar(), a null pointer for fgets(), etc.)

The numeric value of the EOF macro is of no relevance anywhere but in the C library, and it shouldn't influence your understanding of the meaning of the EOF condition.

  • Intentionally omitted from this answer to keep it simple: the behavior of ^D when not at the beginning of a line, and any reference to the sizeof(int)==1 problem. –  Nov 16 '16 at 17:23
  • The answer needs an explanation for non-C-programmers: read() is a standard C library call. When the file descriptor being read is in the default blocking mode, read() waits until characters to read are available through the descriptor. After it reads them (into a given buffer), it returns the number of characters read. The special return value 0 indicates the end of file condition. (other special return value - indicating errors or other special conditions is -1) – pabouk - Ukraine stay strong Jun 09 '22 at 06:49
5

EOF in the context of C is just something that cannot appear in a file. EOT is an ASCII character that historically signalled the end of a message (and is a special character in UNIX terminals that means end of stream when it appears in user input only), but it CAN appear in files, so using it in C to signal the end of a file would be a terrible idea when reading binary files!

Muzer
  • 2,293
4

EOT is one of a number of control characters used by seerial devices. There are a number of other control characters which are related to transmission of data over serial lines or storage of files on a serial source like paper tape. These include characters such as SOH, STX, ETX, FS, RS, GS, and, US. Additional control characters are used for transmission control and error correction.

On a serial connection an EOT (End Of Transmission) character indicates a desire to end the transmission. Serial connections are usually accessed using a file driver. When the serial transmission ends, the file driver reports this as an EOF (End Of File) condition.

EOF is not a character. getchar() returns an integer. A valid character while will have a value in the range 0 to 255. The value of -1 is often used as false/invalid/fail indicator on Unix/Linux. (Actually a non 0 value, as there are any number of reasons not to succeed, but usually only one success case.) When getchar() returns -1 it is clearly not returning a character. However, if you store the output in a byte, you won't be able to distinguish EOF from the DEL (Delete) character.

BillThor
  • 8,965
  • If you use only standard 7-bit ASCII encoding, DEL is encoded as 01111111 while -1 is encoded as 11111111, so it's OK to use it as EOF. – Ron Inbar Feb 22 '23 at 15:43