4

In the typescript of script command (i.e. in the saved file), the newline is CR + LF (\r\n), although the original one (fed to script) is LF. Why? It seems it's tty issue, which I don't know at all. Can someone explain it without much detail?

I'm not in any trouble; I'm just curious. :) (But I think it's better to fix it, or at leasd should be documented.)

My script is from util-linux, but probably it does not matter much.

3 Answers3

7

The deep reason for the discrepancy between program output and a captured tty stream (e.g. typescript) is that tty's used to be printers.

Before unix, text always had a CRLF at the end of a line, not because it was considered to be the logical representation of a line termination, but because the characters individually had real physical meaning: move the print head all the way to the left, and advance the paper.

Unix took a radical new approach: it treats text files on disk as a useful object in their own right (not just instructions for a printer), and lines as logical entities. The two-character line terminator is unnecessarily complicated in the unix worldview.

But they had to work with existing hardware - printers and CRT dumb terminals that didn't recognize a single "end of line" character, but only a CR to do half the job and an LF to do the other half. So a translation had to be done, and it was done at the closest possible place to that hardware - in the tty driver.

Since then, it's all been backward compatibility. So you have a terminal emulator that insists on CRLF, and a tty driver that supplies it when a program outputs a newline.

  • 1
    TTYs never really stopped being printers. You are using TTY, teletypewriter, to mean terminal, It's a common conflation in Unix Land, but especially in discussions that lean on the history of computing there is a distinction. – JdeBP Feb 09 '17 at 15:18
  • 1
    I don't think it's bad to refer to the devices represented by /dev/tty* as "tty's". It doesn't just stand for "teletype(writer)" any more. It evolved a new meaning. Language does that. –  Feb 09 '17 at 15:29
  • Thanks, but it's a history of newline, not having a direct relation to the question. The last paragraph is meaningless, since it's simply tantamount to saying "if it's the case, it is the case." – teika kazura Feb 12 '17 at 06:37
  • 1
    The history is the reason. script doesn't exist in a vacuum. –  Feb 12 '17 at 06:58
6

The typescript output captures all the characters sent to the pty. If you use for example stty -opost to stop the terminal driver from doing its normal change of the newline character to CR+LF then you will see that you only have LF characters in the output.

Hopefully helpful tip, use

col -b < typescript

to do a first pass at cleaning up the file.

icarus
  • 17,920
  • Let us be precise. (1) In the original console (I tried linux console and X terms) the newline is LF. (2) In the console that script seems to set up, or more precisely $ script -c <command> where command can be dash, bash, or any program, the newline is CR+LF. So by stty -opost it can be changed to LF. (3) A typescript file is a faithful copy of the console that script makes. – teika kazura Feb 10 '17 at 08:30
  • Am I right? If so why the item (2)? Is it an idiosyncrasy of script, or Linux (perhaps Unix in general) default? The git repo of script.c from util-linux is found at: (raw), (tree) – teika kazura Feb 10 '17 at 08:31
  • 1
    @teikakazura (3) is correct, (1) is wrong, not sure what to say about (2). In a file the newline is LF. The driver for the consoles usually translates this LF to CRLF, but other options are possible. The stty -opost turns off all output translations. Try running stty -opost; head /etc/passwd ; stty opost; head /etc/passwd to see that a LF just moves you down a line and you need the the translation. – icarus Feb 10 '17 at 11:29
  • Do you know how the terminal escapes can be filtered? col -b doesn't take care of those. Edit: Just found it; https://unix.stackexchange.com/q/14684/135943 – Wildcard Sep 21 '17 at 00:52
4

My own answer, after learning from the answer and the comment by icarus:
You have to distinguish "a newline in a file" and "a newline in a console". In a console, the true newline is, counter-intuitively, CRLF, as we will see below.

In UNIX convetion, in text files LF means a newline, and vice versa, you mean a newline by LF. (By "you mean", I mean say in a natural language text.) In DOS CR+LF, and so on. Ok. Everyone knows it.

(Unix) console are more complicated. First you have to remember LF and CR are control codes, i.e. can be used to control a console, e.g. getting bold, color, etc.

If you feed a LF (\n, linefeed) to a console, then you get a newline. The catch is, well, the two catches are: (1) Consoles are double layered, so to say; they consist of a filter and a rendering part. (Ad hoc nomenclature.) The hidden (to ordinary users) filter translates LF to CRLF. (2) The renderer needs CRLF(\r\n) for a newline in ordinary sense. See below for more.

The typescript file created by the script (1) command records the characters after the input to the console is filtered. That's why newlines in typescript is CRLF.

Details & misc. facts:

  • The console renderer prints LF as "move down cursor one line" and CR as "move the cursor to the beginning of the line."
  • You can turn off LF->CRLF conversion by $ stty -opost and efface it by $ stty opost. "opost" is an abbreviation of "Output POSTprocessing".
    • More precisely, opost does LF->LFCR when onlcr is set. When onocr is set, CR will be deleted when at the beginnig of line, etc. Ref: POSIX chap 11 "General Terminal Interface".
  • To "Enter" key is bound to LF in Unix, called "Return" in keymap terminology. (See this question for the details.)
  • There's also escape code variants; man 4 console_codes explains that "ESC D" (\eD) is linefeed, and "ESC E" (\eE) is newline. If you print them, "ESC D" is a "move cursor down", and "ESC E" a CR+LF, regardless of ±opost-ness.

To do some experiments, I recommend to write from a separate console. For example $ echo -ne '1st\n2nd\r\n3rd\n" > /dev/tty1 writes to the first non-X console, and /dev/pts/0 is the first X terminal. This is not the most convenient way, but least ambiguous.