93

As far as I know, every operating system has a different way to mark the end of line (EOL) character. Commercial operating systems use carriage return for EOL (carriage return and line feed on Windows, carriage return only on Mac). Linux, on the other hand, just uses line feed for EOL.

Why doesn't Linux use carriage return for EOL (and solely line feed instead)?

  • 78
    Macs haven't used CR only since prior to OS X...now use *nix style LF, I believe. – B Layer Dec 19 '17 at 13:53
  • 33
    I think there are/have been a number of commercial Unixy OS:s too. – ilkkachu Dec 19 '17 at 14:00
  • 20
    Explained on Wikipedia. Basically Multics in the last 60s (which inspired Unix, which inspired Linux) added some level of abstraction to avoid having the text encoding being encumbered by limitations of teletype devices so it didn't have to encode newline on two characters (which makes even less sense 50 years later of course). – Stéphane Chazelas Dec 19 '17 at 16:53
  • 74
    The second paragraph is a valid question, but the first paragraph is so full of oversimplifications and outright errors that it is drowning it out, with answerers having to correct a whole bunch of iffy and faulty premises before they even get to the question. – JdeBP Dec 19 '17 at 17:48
  • 1
    Macs still use both, depending on the app. Many apps try to auto-detect and auto-correct, which occasionally causes problems when one or the other shouldn’t be done or is done wrong. But most of the time works well. – WGroleau Dec 19 '17 at 18:43
  • See (wikipedia)[https://en.wikipedia.org/wiki/Newline#Representations_in_different_character_encoding_specifications] for non Unix commercial systems using LF + commercial Unix - and non Apple OS using LF + other choices. So second sentance is false. – mmmmmm Dec 19 '17 at 21:57
  • 2
    IIRC IBM OSes of yore (OS/something, MVS/someting, and VM/something (actually its CMS component), that were in their time the epitome of corrmercial OSes) hadn't even the concept of end-of-line characters. – xenoid Dec 19 '17 at 22:43
  • 21
    What? Linux is a free approximation of a commercial OS standard called UNIX. UNIX-compliant systems cost a lot of money back then and they still do today. – errantlinguist Dec 20 '17 at 01:46
  • If you think this is a mess, have a look at User Agent Strings! – wizzwizz4 Dec 20 '17 at 18:55
  • 3
  • 1
    An answer would be: why not? – Andrea Lazzarotto Dec 21 '17 at 00:56
  • 2
    @errantlinguist the BSD variants don't cost a lot of money – mcalex Dec 21 '17 at 06:48
  • @mcalex It seems BSD isn't UNIX (anymore). Berkeley UNIX systems cost a lot of money to license. You can consider the fact that you don't have to pay that yourself for BSD as a generous gift from the Californian education system. – errantlinguist Dec 21 '17 at 11:54
  • 1
    You can see why, when you try to connect with an old windows telnet to a unix server. Then you get the output of LF without CR, which looks like stairs. Just think about what CR and LF means. CRLF made sense, when they were actual control characters for a line printer. Unix started using LF, because it is quite obvious that you do not want a line feed without carriage return. On the other hand for ascii art it would be useful ;-). – allo Dec 22 '17 at 15:05
  • 2
    Most Unixes use the LF, and there exist lots of commercial unixes too. Back in the 70s far before windows even existed. Linux just happens to be the unix that people who don't know much about computers know of. – mathreadler Dec 23 '17 at 10:38
  • 3
    History. It all comes down to history. Even though digital computers are less than a hundred years old they have millions of years of history. It used to take humans days to get a message from one town to the next. Computers do it in microseconds. Imagine how fast their history develops. In 70 years computers have gone from vacuum tubes and relays to integrated circuits, from a single processor that filled a room to multiple processors that you hold in your hand, from memory cells made of donuts to developers full of donuts. Truly the world is a wonderful place. Mmmmm - donuts... – Bob Jarvis - Слава Україні Dec 23 '17 at 16:07
  • Because it's better. Side note though: the telnet protocol insists on "\r\n" (some servers do it the other way round). I've seen Windows/DOS clients have problems when this wasn't done but Linux handles both fine. I might be simplifying this - it's been a very long time now! – Pryftan Jan 06 '20 at 21:09

4 Answers4

344

Windows uses CRLF because it inherited it from MS-DOS.

MS-DOS uses CRLF because it was inspired by CP/M which was already using CRLF.

CP/M and many operating systems from the eighties and earlier used CRLF because it was the way to end a line printed on a teletype (return to the beginning of the line and jump to the next line, just like regular typewriters). This simplified printing a file because there was less or no pre-processing required. There was also mechanical requirements that prevented a single character to be usable. Some time might be required to allow the carriage to return and the platen to rotate.

Gnu/Linux uses LF because it is a Unix clone.1

Unix used a single character, LF, from the beginning to save space and standardize to a canonical end-of-line, using two characters was inefficient and ambiguous. This choice was inherited from Multics which used it as early as 1964. Memory, storage, CPU power and bandwidth were very sparse so saving one byte per line was worth doing. When a file was printed, the driver was converting the line feed (new-line) to the control characters required by the target device.

LF was preferred to CR because the latter still had a specific usage. By repositioning the printed character to the beginning of the same line, it allowed to overstrike already typed characters.

Apple initially decided to also use a single character but for some reason picked the other one: CR. When it switched to a BSD interface, it moved to LF.

These choices have nothing to do with the fact an OS is commercial or not.

1 This is the answer to your question.

jlliagre
  • 61,204
  • 22
    Multics used Line Feed in agreement with the contemporary ISO/IEC 646, which prescribed it as the way to represent both carriage return and line feed together, in a single character, if a one-character representation was needed. – JdeBP Dec 19 '17 at 17:39
  • 10
    I doubt the real reason for choosing a single character was to save space. The real reason was to define a single newline character that is independent of the output device (terminal, etc.) The terminal (or similar) driver then takes care of converting the newline to the appripriate control character sequence, typically CR LF. This allows for a nice abstraction when programming with strings: the newline is presented with a single \n, independently of some particular output device. – Johan Myréen Dec 19 '17 at 17:40
  • 3
    @JohanMyréen Saving space might not have been the only motivation and I agree standardization and device independence was one too but saving space was certainly a good reason to use a single character. – jlliagre Dec 19 '17 at 17:57
  • 15
    Nonetheless, the 1970 paper by Saltzer and Ossanna (Remote terminal character stream processing in Multics) is quite clear that device independence was the reason. – JdeBP Dec 19 '17 at 18:10
  • 4
    @JdeBP This paper states *reduction to canonical form of the stream of characters passing to and from remote terminals is the subject of this paper*. Reducing to a canonical form was a way to save space (too). Expressed differently, using two characters was an inefficient and ambiguous waste of space. – jlliagre Dec 19 '17 at 18:31
  • 3
    Note that many network protocols also use CR+LF. Old IBM hardware that used EBCDIC (which doesn't have direct representations for CR or LF) used NL (which doesn't exist in ASCII). Unicode also gives us other alternatives like U+2028 LINE SEPARATOR. – Adrian McCarthy Dec 19 '17 at 19:37
  • 1
    VMS had/has all sorts of newline characters, including none at all (fixed-length records) or implicit newlines where the record length is stored at the beginning of each record. I believe that stream files used just LF as a record delimiter. – doneal24 Dec 19 '17 at 19:43
  • 47
    And teletypes got this from non-electric typewriters. CR-LF describes the mechanical action you take when you push the lever on your left. Return the "carriage" which holds the platen (roller) all the way back to the right (which puts the keystrike at the first position on the left) and crank the platen one line height rotation to move to the next typable line. Yes,I'm admittedly showing my age here. – cdkMoose Dec 19 '17 at 20:02
  • Comments are not for extended discussion; this conversation has been moved to chat. – terdon Dec 22 '17 at 13:29
  • I guess Apple's choice to use CR is to be different. "We are special and have to have our way...". Everybody with his own "standard". Like left/right hand direction in cars, 50/60Hz, 110/220V, etc. – i486 Dec 24 '17 at 23:00
  • @i486 They use LF and have for some time. I'm pretty sure it was even at the time you wrote that comment. That or I'm really off here but I don't think so. – Pryftan Jan 06 '20 at 21:12
17

The wikipedia article on "Newline" traces the choice of NL as a line terminator (or separator) to Multics in 1964; unfortunately the article has few citations to sources but there is no reason to doubt this is correct. There are two obvious benefits to this choice over CR-LF: space saving, and device independence.

The main alternative, CR-LF, originates in the control codes used to physically move the paper carriage on a teletype machine, where CR would return the carriage to its home position, and LF would rotate the paper roller to move the print position down one line. The two control characters appear in the ITA2 code which dates back to 1924 and which is apparently still in use (see Wikipedia); apparently ITA2 took them from the Murray variant of Baudot code which dates to 1901.

For younger readers it is worth noting that in the mainframe tradition, there was no newline character; rather a file was a sequence of records which were either fixed length (often 80 characters, based on punched cards) or variable length; variable length records were typically stored with a character count at the start of each record. If you have a mainframe file consisting of a sequence of variable length records each containing arbitrary binary content, converting this losslessly to a UNIX-style file can be a tricky conversion.

Linux, of course, was just a re-implementation of Unix, and Unix took many of its design decisions from Multics, so it looks like the key decision was made in 1964.

13

Other answers have traced the inheritance chain back to the 1960s, and teletypes. But here's one aspect they didn't cover.

In the days of teletypes, there were times when it was desirable to do something called overstriking. Overstriking was sometimes used to obscure a password, because erasing the password was just not doable. Other times, overstriking was done to get a symbol that was not in the font. For example, the letter O and a slash produce a new symbol.
Overstriking was acheived by putting in a carriage return with no line feed, athough backspace was sometimes used. For this reason, the unix people decided against carriage return as the line separator, and opted for line feed instead. This also worked out well for reading texts produced using the CRLF convention. The CR gets swallowed, and the LF becomes the separator.

  • 1
    Thank you for this accurate memory. Backspace and Carriage Return (alone) were also used on printer to produce bold or underlined characters. And to go back to the origins, these two commands already existed in the 1930 to make the "carriage" "return" to its leftmost position, either to overstrike or to permit to start a fresh line with the help of the "new line" key which made rotate the roller one step. See: https://en.wikipedia.org/wiki/IBM_Electric_typewriter . So "CR" + "LF" are dating before the computer history. – dan Dec 26 '17 at 10:01
  • 2
    It may also be worth noting that some teletypes required that a CR be followed by a non-printing character to give the carriage time to fully cycle before next printing character arrived, and didn't support backspacing at all, so sending an LF after CR didn't cost anything, and the only way to accomplish overprinting was via CR. – supercat Dec 26 '17 at 18:40
  • The "days of teletypes" begins before the computer era. in the 1960s many computers had a console teletype for the operator, and even more used ASCII as their character set. – Walter Mitty Dec 27 '17 at 13:36
7

While you could translate the historical question into a question about the C language, the reason that Linux and all POSIX-conforming or POSIX-ish systems must use LF (or at least whatever the C '\n' character is) as the newline is a consequence of the intersection of the requirements of C and POSIX. While C allows "text files" and "binary files" to differ (in fact text files can be record-based consisting of a sequence of line records, in addition to less exotic things like having '\n' translated to/from CR/LF like on DOS/Windows), POSIX mandates that text and binary mode behave the same. This is largely the reason that the command line tools like cat are powerful/useful; they would be much less so if they only worked with binary, or only with text, but not both.

  • 13
    This choice predates POSIX by many years. As mentioned in jlliagre's answer, it goes back to the beginning of Unix, which copied it from Multics. – Barmar Dec 20 '17 at 17:16
  • 4
    The choice in Linux does not predate POSIX by many years. Of course POSIX codified what was already existing practice, since that was its whole reason to exist. – R.. GitHub STOP HELPING ICE Dec 20 '17 at 18:26
  • As far as Linux is concerned, there was no real choice to make in the first place. The Gnu standard library which is used by Linux is contemporary to POSIX, and was using line-feed since its inception for obvious compatibility reasons because it was developed, tested and used on Unix systems. The Linux kernel was designed to provide Unix like system calls to a standard C library (GNU or other) and adding the complexity required to handle differently text files and binary files would have been overkill and break compatibility with existing code. That would have been nonsensical from Torvalds. – jlliagre Dec 30 '17 at 14:41
  • @jlliagre: It was still a choice to make something compatible with existing practices rather than random gratuitous incompatibilities. You can only say that wasn't a choice in the context of assuming Linux's success. Plenty of people make toy hobbyist OS's full of gratuitously wacky choices and they never go anywhere. – R.. GitHub STOP HELPING ICE Dec 30 '17 at 16:12
  • @R I mean Linux is only a kernel and it essentially required GNU to work (initially Torvalds goal was to be compatible with minix instead of gnu, but that makes no difference here). The newline choice is unrelated to Linux because it was made a long time before Linux was written. There has been a lot of more or less gratuitous wacky choices in the various Linux releases, they didn't prevent Linux to be successful. One of the reasons likely being that many of these choices were revisited later. – jlliagre Dec 31 '17 at 23:52
  • @R..GitHubSTOPHELPINGICE But Linux derived from Unix which does predate. So you're still wrong in that regard. As I see someone else has already pointed out. Oh well. Also iirc text files have never been distinguished from binary files in C at least in Unix (referring to I/O modes etc.). Definitely don't for a very long time now though. – Pryftan Jan 06 '20 at 21:19