20

Unix / Linux EOL is LF, linefeed, ASCII 10, escape sequence \n.

Here's a Python snippet to get exactly one keypress:

import sys, tty, termios
fd = sys.stdin.fileno()
old_settings = termios.tcgetattr(fd)
try:
    tty.setraw(sys.stdin.fileno())
    ch = sys.stdin.read(1)
finally:
    termios.tcsetattr(fd, termios.TCSADRAIN, old_settings)
    return ch

When I press Enter on my keyboard in response to this snippet, it gives \r, carriage return, ASCII 13.

On Windows, Enter sends CR LF == 13 10. *nix is not Windows; why does Enter give 13 rather than 10?

cat
  • 3,468
  • 4
  • 23
  • 51

2 Answers2

30

Essentially "because it's been done that way since manual typewriters". Really.

A manual typewriter had a carriage on which the paper was fed, and it moved forward as you typed (loading a spring), and had a lever or key which would release the carriage, letting the spring return the carriage to the left-margin.

As electronic data entry (teletype, etc) were introduced, they carried that forward. So the Enter key on many terminals would be labeled Return.

Line feeds happened (in the manual process) after returning the carriage to the left margin. Again, the electronic devices imitated the manual devices, making a separate line-feed operation.

Both operations are encoded (to allow the teletype to be more than a standalone device creating a paper type), so we have CR (carriage-return) and LF (line-feed). This image from ASR 33 Teletype Information shows the keyboard, with Return on the right side, and Line-Feed just to the left. Being on the right, it was the main key:

enter image description here

Unix came along later. Its developers liked to shorten things (look at all of the abbreviations, even creat for "create"). Faced with a possibly two-part process, they decided that line-feeds only made sense if they were preceded by carriage-returns. So they dropped the explicit carriage returns from files, and translated the terminal's Return key to send the corresponding line-feed. Just to avoid confusion, they referred to line-feed as "newline".

When writing text on the terminal, Unix translates in the other direction: a line-feed becomes carriage-return / line-feed.

(That is, "normally": so-called "cooked mode", in contrast to "raw" mode where no translation is done).

Summary:

  • carriage-return / line-feed is the sequence 13 10
  • the device sends 13 (since "forever" in your terms)
  • Unix-like systems change that to 13 10
  • Other systems do not necessarily store just 10 (Windows largely accepts just 10 or 13 10, depending how important compatibility is).
Thomas Dickey
  • 76,765
  • 1
    I looked for a nice picture to show the levers for a manual typewriter, but found only low-resolution images. – Thomas Dickey Feb 22 '16 at 02:07
  • My first reaction to that photo was an audible "Whoa." :D – cat Feb 22 '16 at 02:37
  • 1
    This is so beautiful, it should go on the wiki :) – tink Feb 22 '16 at 02:41
  • So, say I'm developing platform-agnostic libreadline: Enter should give 13 / CR always, yes? – cat Feb 22 '16 at 02:59
  • 3
    If you had to type on one of those, you'd abbreviate everything too! – Michael Hampton Feb 22 '16 at 04:02
  • 1
  • 1
    For those who're interested: creat for create. – h.j.k. Feb 22 '16 at 05:30
  • 1
    In raw mode, I would expect 13, in cooked mode, 10. The question was about raw mode, of course. – Thomas Dickey Feb 22 '16 at 10:14
  • 3
    Regarding the history part: the manual typewriters I used in my use, similar to this one only had one lever. When you pulled it, it first cranked the roller (line feed) and then it would just pull the carriage along. And it was this pull that loaded the spring. Each letter typed, or tab pressed, would release the spring somewhat, moving the carriage back to the "unloaded" position, which was at the end of the line, not its start. – RealSkeptic Feb 22 '16 at 10:47
  • 1
    Cont: electric typewriters that came later, like this one did not have a separate LF key. IIRC pressing "Return" would also induce a linefeed. I think the separation only came on teletypes, because you didn't have access to the actual roller. – RealSkeptic Feb 22 '16 at 10:48
  • 2
    On input, CR is translated (by the tty line discipline) to LF, not CR LF. It's on output (including the echo of the input) that LF is translated to CR LF. When you type foo<Return> in cooked mode, the application reads foo\n and foo\r\n is sent back by the line discipline for echo to the terminal. – Stéphane Chazelas Feb 22 '16 at 16:08
  • 2
  • 1
    @RealSkeptic: The time required for the teletype to return the carriage to home position would often exceed the time required to send a character, so it was necessary that the character following a CR not be anything that relied upon the character position. Since it was sometimes useful to return the carriage without advancing the paper, and since sending CR+LF would be no worse than sending a CRLF character followed by a NUL, it was simplest to simply separate CR and LF. – supercat Feb 22 '16 at 18:22
  • @cat: To make things more fun, the "SPEED" command on some BASIC interpreters had an option to add nulls after carriage returns. Interestingly, that wasn't just to slow things down for the operator; if one was using a teletype to save a program to paper tape (by using LIST), an attempt to re-import the program could fail if the tape reader started sending a line while the interpreter was busy adding the previous line to the program. Each null byte added after a carriage return would delay the next line by 1/10 second. – supercat Feb 22 '16 at 19:43
  • @supercat: Many POSIXy systems still have that, in the output flags. NL0 and NL1 under NLDLY mask (for no delay or one-character delay), and CR0, CR1, CR2, and CR3 under CRDLY mask (for no delay through three-character delay). If OFILL flag is set, the delays are actual characters (ASCII NUL or DEL, 0 or 127), otherwise they're timed. Some systems even support turning horizontal tabs into correct number of spaces (8-column tabs). Lots of historical baggage and weirdness, but it's worked fine for decades.. :) – Nominal Animal Feb 22 '16 at 21:29
  • @NominalAnimal: Until the last Vintage Computer Fair, I hadn't really appreciated the idea that in the old days one would save a program by sending it to an ASR-33 teletype and could then reload it using that same teletype. For machine-language programs, a dedicated tape reader was much faster than the ASR-33, but BASIC required a significant fraction of a second after each line entry (I think Altair BASIC required five nulls). I would think a command to ask BASIC to send ^S immediately after each CR input and ^Q when ready for more input would be helpful, but... – supercat Feb 22 '16 at 22:36
  • ...one would then want a means of asking a program listing to be followed by a command to turn off that feature, or else the tape reader would keep getting turned on all the time. – supercat Feb 22 '16 at 22:37
  • @supercat: Heh, that's before my time.. Right. For others following along, POSIX termios does support such software handshaking, IXON (enable for output) and IXOFF (enable for input) iflags, with default start character (cc[VSTART]) being ^S (ASCII 17, DC1), and default stop character (cc[VSTOP]) being ^Q (ASCII 19, DC3). That way no delays are needed, as the other end will wait for a stop before it sends start, response, and stop (if enabled both ways; otherwise just response). The terminal layer will hide start/stop chars from the application, too. Nifty. – Nominal Animal Feb 23 '16 at 01:22
  • @NominalAnimal: What a lot of people don't realize is that control-S and control-Q are the characters to turn the tape reader on and off. If an application talking to an ASR-33 is being used interactively with the keyboard, sending a control-Q may have undesired consequences. – supercat Feb 23 '16 at 04:24
  • @NominalAnimal - please use a chat room if you'd like to discuss this further amongst yourselves. Comments should only be used in fleshing out ideas/issues wrt the Q/A they're underneath. – slm Feb 23 '16 at 06:41
  • @supercat - ^^^^ – slm Feb 23 '16 at 06:42
  • 1
    FYI, Unix copied this behavior from Multics. The original Unix designers had previously worked on the Multics project. – Barmar Feb 24 '16 at 18:36
12

While Thomas Dickey's answer is quite correct, Stéphane Chazelas correctly mentioned in a comment to Dickey's answer that the conversion is not set in stone; it is part of the line discipline.

In fact, the translation is completely programmable.

The man 3 termios man page contains basically all the pertinent information. (The link takes to Linux man-pages project, which does mention which features are Linux-only, and which are common to POSIX or other systems; always check the Conforming to section on each page there.)

The iflag terminal attributes (old_settings[0] in the code shown in the question in Python) has three relevant flags on all POSIXy systems:

  • INLCR: If set, translate NL to CR on input
  • ICRNL: If set (and IGNCR is not set), translate CR to NL on input
  • IGNCR: Ignore CR on input

Similarly, there are related output settings (old_settings[1]), too:

  • OPOST: Enable output processing.
  • OCRNL: Map CR to NL on output.
  • ONLCR: Map NL to CR on output. (XSI; not available in all POSIX or Single-Unix-Specification systems.)
  • ONOCR: Skip (do not output) CR in the first column.
  • ONLRET: Skip (do not output) CR.

For example, you could avoid relying on the tty module. The "makeraw" operation just clears a set of flags (and sets the CS8 oflag):

import sys
import termios

fd = sys.stdin.fileno()
old_settings = termios.tcgetattr(fd)
ch = None

try:
    new_settings = termios.tcgetattr(fd)
    new_settings[0] = new_settings[0] & ~termios.IGNBRK
    new_settings[0] = new_settings[0] & ~termios.BRKINT
    new_settings[0] = new_settings[0] & ~termios.PARMRK
    new_settings[0] = new_settings[0] & ~termios.ISTRIP
    new_settings[0] = new_settings[0] & ~termios.INLCR
    new_settings[0] = new_settings[0] & ~termios.IGNCR
    new_settings[0] = new_settings[0] & ~termios.ICRNL
    new_settings[0] = new_settings[0] & ~termios.IXON
    new_settings[1] = new_settings[1] & ~termios.OPOST
    new_settings[2] = new_settings[2] & ~termios.CSIZE
    new_settings[2] = new_settings[2] | termios.CS8
    new_settings[2] = new_settings[2] & ~termios.PARENB
    new_settings[3] = new_settings[3] & ~termios.ECHO
    new_settings[3] = new_settings[3] & ~termios.ECHONL
    new_settings[3] = new_settings[3] & ~termios.ICANON
    new_settings[3] = new_settings[3] & ~termios.ISIG
    new_settings[3] = new_settings[3] & ~termios.IEXTEN
    termios.tcsetattr(fd, termios.TCSANOW, new_settings)
finally:
    termios.tcsetattr(fd, termios.TCSADRAIN, old_settings)

return ch

although for compatibility's sake, you might wish to check if all those constants exist in the termios module first (if you run on non-POSIX systems). You can also use new_settings[6][termios.VMIN] and new_settings[6][termios.VTIME] to set whether a read will block if there is no pending data, and how long (in integer number of deciseconds). (Typically VMIN is set to 0, and VTIME to 0 if reads should return immediately, or to a positive number (tenth of seconds) how long the read should wait at most.)

As you can see, the above (and "makeraw" in general) disables all translation on input, which explains the behaviour cat is seeing:

    new_settings[0] = new_settings[0] & ~termios.INLCR
    new_settings[0] = new_settings[0] & ~termios.ICRNL
    new_settings[0] = new_settings[0] & ~termios.IGNCR

To get normal behaviour, just omit the lines clearing those three lines, and the input translation is unchanged even when "raw".

The new_settings[1] = new_settings[1] & ~termios.OPOST line disables all output processing, regardless what the other output flags say. You can just omit it to keep output processing intact. This keeps output "normal" even in raw mode. (It does not affect whether input is automatically echoed or not; that is controlled by the ECHO cflag in new_settings[3].)

Finally, when new attributes are set, the call will succeed if any of the new settings were set. If the settings are sensitive -- for example, if you are asking for a password on the command line --, you should get the new settings, and verify the important flags are correctly set/unset, to be sure.

If you want to see your current terminal settings, run

stty -a

The input flags are usually on the fourth line, and the output flags on the fifth line, with a - preceding the flag name if the flag is unset. For example, the output could be

speed 38400 baud; rows 58; columns 205; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = M-^?; eol2 = M-^?; swtch = M-^?; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 hupcl -cstopb cread -clocal -crtscts
-ignbrk brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff -iuclc ixany imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke

On pseudoterminals, and USB TTY devices, the baud rate is irrelevant.

If you write Bash scripts that wish to read e.g. passwords, consider the following idiom:

#!/bin/bash
trap 'stty sane ; stty '"$(stty -g)" EXIT
stty -echo -echonl -imaxbel -isig -icanon min 1 time 0

The EXIT trap is executed whenever the shell exits. The stty -g reads the current settings of the terminal at the start of the script, so the current settings are restored when the script exits, automatically. You can even interrupt the script with Ctrl+C, and it'll do the right thing. (In some corner cases with signals, I've found that the terminal sometimes gets stuck with the raw/noncanonical settings (requiring one to type reset + Enter blindly at the terminal), but running stty sane before restoring the actual original settings has cured that every time for me. So that's why it's there; a sort of added safety.)

You can read input lines (unechoed to the terminal) using read bash built-in, or even read the input character-by-character using

IFS=$'\0'
input=""
while read -N 1 c ; do
    [[ "$c" == "" || "$c" == $'\n' || "$c" == $'\r' ]] && break
    input="$input$c"
done

If you don't set IFS to ASCII NUL, read built-in will consume the separators, so that c will be empty. Trap for young players.

  • 1
    Oh, for gods' sake, nothing is ever simple :( – cat Feb 22 '16 at 17:59
  • I'm accepting this answer because it's most helpful to me as a Python dev, even though the other one is great – cat Feb 22 '16 at 18:01
  • 2
    @cat: While this may be most helpful to you, I'd still say Thomas Dickey's answer is more correct. I'd rather you accept that instead. – Nominal Animal Feb 22 '16 at 18:03
  • I'm so amused by the fact we still measure things in baud here in the 21st century – cat Feb 22 '16 at 18:03
  • Paraphrasing the help center, the accepted answer is the one most helpful to me: even though the other one may be more right; this one is not wrong – cat Feb 22 '16 at 18:06
  • @cat: No, actually my answer does not answer the stated question, while Thomas Dickey's does. My answer describes solutions to the underlying problem instead: "how do I fix/change the behaviour?". It is typical for my answers, and I don't mind that other answers that are more to the point get selected as accepted; I'm quite content to just provide insights into the underlying, unstated problems. In fact, I see it quite unfair of you to accept my answer just because it helps you more, whereas Thomas Dickey correctly -- and with really interesting detail -- answers the stated question! – Nominal Animal Feb 22 '16 at 18:11
  • @cat: In other words, you posed your question poorly, and I happened to tell you some background information that helped you solve the actual problem you're having. You should have asked whether the translation is fixable, not why it happens. Thomas Dickey pretty much perfectly answered you why. – Nominal Animal Feb 22 '16 at 18:14
  • 4
    While your willingness to forgo your +15 rep does you credit, @cat is quite right. Whether an answer is accepted or not is no indication that it is the "most correct" of the posted answers. It only means that's the one the OP preferred for whatever personal reasons. The "most correct" is usually the most highly upvoted. Accepting an answer is down to personal preference, if the OP prefers yours, there is no reason not to accept it. – terdon Feb 22 '16 at 18:59
  • 1
    @terdon: Okay, I stand corrected, then. – Nominal Animal Feb 22 '16 at 21:20