31

I just learned a trick to create a new file with the cat command. By my testing, if the last line is not followed by a newline, I have to type ctrl+d twice to finish the input, as demonstrated below.

[root@192 ~]# cat > test
a
b
ctrl+d[root@192 ~]# cat > test
a
bctrl+dctrl+d[root@192 ~]#

Is this expected? Why this behavior?

  • I think the answer lies in the readline manual: end-of-file (usually C-d) The character indicating end-of-file as set, for example, by ``stty''. If this character is read when there are no characters on the line, and point is at the beginning of the line, Readline interprets it as the end of input and returns EOF. – schrodingerscatcuriosity May 03 '22 at 22:10
  • 4
    @schrodigerscatcuriosity, that's the same idea, except that readline does it by itself, while cat probably just relies on the raw tty behaviour. stty -a should show the terminal's idea of the "eof" character, something like eof = ^D. And you could change it with e.g. stty eof ^Q. – ilkkachu May 03 '22 at 22:16
  • do you only need one when on the beginning of a line? (a unix text file: is supposed to only contain lines that finish by a newline, so in your exemple the "b" is not really counted. wc -l would display 1 ( compare printf "a\nb" | wc -l and printf "a\nb\n" | wc -l ) – Olivier Dulac May 04 '22 at 09:41
  • Useless use of cat? Try > test instead. – studog May 04 '22 at 12:59
  • 8
    @studog : no it isn't a uuoc. > test will juste create a new empty test file. cat > test will repeat (cat) what is entered (after readline has interpreted any special chars such as ctrl-d, backspaces, etc) and send it line by line to the test file. – Olivier Dulac May 04 '22 at 13:24
  • 1
    @studog that would only work in zsh, since it runs cat behind the curtains. Or rather, what ever $NULLCMD contains, which is cat by default. In others it'd create an that empty file. – ilkkachu May 04 '22 at 19:20
  • 1
    @OlivierDulac, readline is a userland library for fancy line editing, it's the one used by Bash. cat very likely doesn't use it, or anything like it. Instead you just get the primitive editing the terminal driver provides. That does include the backspace and Ctrl-D for EOF though, but readline supports e.g. moving the cursor in the middle of the entered text too (and stuff like tab-completion), while the terminal driver probably just shows something like^[[D for the left arrow key etc. (and anyway, cat is supposed to just copy the bytes verbatim, any fancy editing would break that.) – ilkkachu May 04 '22 at 19:26
  • @ilkkachu: thanks for these corrections. – Olivier Dulac May 05 '22 at 03:34
  • @OlivierDulac You are correct! I misread the question text. – studog May 05 '22 at 14:00
  • very useful for sending emails, which requires this. The documention is...somewhere.. – Nicholas Saunders Nov 28 '23 at 00:08

2 Answers2

47

Yes, it's expected.

We say that Ctrl-D makes cat see "end of file" in the input, and it then stops reading and exits, but that's not really true. Since that's on the terminal, there's no actual "end", and in fact it's not really "end of file" that's ever detected, but any read() of zero bytes.

Usually, the read() system call doesn't return zero bytes except when it's known there's no more available, like at the end of a file. When reading from a network socket where there's no data available, it's expected that new data will arrive at some point, so instead of that zero-byte read, the system call will either block and wait for some data to arrive, or return an error saying that it would block. If the connection was shut down, then it would return zero bytes, though. Then again, even on a file, reading at (or past) the end is not an interminably final end as another process could write something to the file to make it longer, after which a new attempt to read would return more data. (That's what a simple implementation of tail -f would do.)

For a lot of use-cases treating "zero bytes read" as "end of file detected" happens to work well enough that they're considered effectively the same thing in practice.


What the Ctrl-D does here, is to tell the terminal driver to pass along everything it was given this far, even if it's not a full line yet. At the start of a line, that's all of zero bytes, which is detected as an EOF. But after the letter b, the first Ctrl-D sends the b, and then the next one sends the zero bytes entered after the b, and that now gets detected as the EOF.

You can also see what happens if you just run cat without a redirection. It'll look something like this, the parts in italics are what I typed:

$ cat
fooCtrl-Dfoo

When Ctrl-D is pressed, cat gets the input foo, prints it back and continues waiting for input. The line will look like foofoo, and there's no newline after that, so the cursor stays there at the end.

ilkkachu
  • 138,973
  • Great explanation! Is there a reason why after pressing Ctrl+d the terminal driver does not pas along everything repeatedly until it passes zero bytes? Is the current behaviour required for something practical? --- I can imagine it can work to flush the terminal's read buffer but you must be very careful to know that the buffer is not empty :D which makes it very impractical. --- The very annoying consequence of the current behaviour is that double pressing Ctrl+d when only one is needed will probably terminate your shell. – pabouk - Ukraine stay strong May 04 '22 at 08:23
  • 1
    @pabouk-Ukrainestaystrong, I'm not sure. Apart from it maybe being simpler that way, and that now it's at least possible to pass a partial line without triggering EOF at the same time. But I don't know much about the tty interface nor about the history there. Anyway, with Bash you can set IGNOREEOF=3 or so to make it ignore the first 3 EOFs/zero reads it gets and to only exit on the fourth one. Then you can double the Ctrl-D safely. (At least until you're on another computer where that isn't set anyway.) – ilkkachu May 04 '22 at 09:10
  • OTOH, creating files with just cat > file is a bit awkward in other ways too. At least I tend to need the ability to go back to edit earlier lines. Also, you could use Enter+Ctrl-D instead, that would make sure the Ctrl-D just sent the zero read. – ilkkachu May 04 '22 at 09:12
  • It just happened few times that I terminated my shell accidentally, not worth of setting IGNOREEOF=3. I almost always use cat > file when I transfer a file over clipborad (practical for multiple remote systems, priceless when forced to work through remote desktop). Unfortunately sometimes it is difficult to get the last newline into the clipboard. – pabouk - Ukraine stay strong May 04 '22 at 09:37
  • @pabouk-Ukrainestaystrong, yep. Though IGNOREEOF is just one line in .bashrc. But yeah, I do the same sometimes, but I've really just hit enter once or twice at the end if I see the cursor isn't at the start of the line. – ilkkachu May 04 '22 at 10:27
  • 4
    Another useful tool to gain understanding is strace -p $(pidof cat) (or equivalent on non-Linux) in another terminal. Then you can see the actual read(0, ...) and write(1, ...) system calls it makes, with a read system call returning whenever you hit the EOF or EOL keys to submit the line-editing buffer. (stty -a) – Peter Cordes May 05 '22 at 10:49
1

A few words about.

As far as I can tell from a quick glance at the source code, cat uses a buffer to optimize its workflow. Referring to the behavior of the simple cat invocation, from standard input and without special command line options:

  1. when the buffer is empty a single Ctrl+D it is enough to exit;
  2. when not, the first Ctrl-D forces the buffer dump (which becomes empty), and the second is interpreted as the command to quit.

This means that if you run your cat> test and simply enter Ctrl-D you will create an empty file (named test) in the current directory, or you will empty the file if it already exists, without the need for a second Ctrl-D .

Beyond the scope of this question, but not so remote: it should be noted that if you are sending characters to a terminal/shell instance an unnecessary second Ctrl-D could cause an unwanted exit (mainly connected to the example above).

Hastur
  • 2,355
  • 6
    You will find that it's not just cat behaving in this way and that all programs that read from standard input will behave in this way, even the shell's built-in read command. This is therefore nothing special about any particular implementation of cat. The reason is already laid out in the accepted answer. – Kusalananda May 04 '22 at 18:25