6

These do not do the same:

$ seq 1000000 | (ssh localhost sleep 1; wc -l)
675173
$ seq 1000000 | (ssh localhost sleep 1 </dev/null; wc -l)
1000000

What is the rationale for ssh reading stdin?

Ole Tange
  • 35,514
  • So does cat. If fail to see the issue. There is an specific option provided to prevent it from doing that (-n). – Kusalananda Jan 26 '22 at 12:22

2 Answers2

8

ssh always reads stdin unless you tell it not to with the -n option (or the -f option).

The reason is so that you can do things like

tar cf - somedir | ssh otherhost "tar xf -"

And it always does this because ssh has no way of knowing if your remote command accepts input or not.

Likely what is happening in your first command is that seq fills up the network and pipe buffers (seq -> ssh -> sleep), and since sleep isn't reading anything, it gets blocked waiting for more reads, and then sleep exits, causing those full buffers to be dumped, and then seq is unblocked, feeding the remainder to wc.

Note that you would get similar results with seq 1000000 | ( cat | cat | sleep 1; wc -l)

In your second command, it is still reading stdin, but you've externally assigned /dev/null to stdin.

user10489
  • 6,740
  • If I got it right, you're basically saying that the local ssh client can't know if the remote commands (the ssh server and the sleep it runs) need any data, so it has to make the data available in any case? Or in other words, there's no way for the remote to send the local side a request to make data available. I suppose that makes sense, since something like regular pipes also don't exactly work by passing requests for data either. – ilkkachu Jan 26 '22 at 12:38
  • ... and you've actually got two regular pipes involved here. ssh can't know the input won't be taken until it's too late. – user10489 Jan 26 '22 at 12:48
5

Unix input/output is based on unidirectional communication primitives: pushing data with write¹, pulling data with read¹, and querying the availability of data with select. It is not the same model as, for example, is common on the web, where the consumer of data sends a request “please give me some data” and the producer replies with the data, or the consumer sends a request “how much data can you give me?” and the producer replies with a size. A data consumer calls read to retrieve whatever data is available, and this doesn't necessarily need to involve the producer (e.g. pipes have a buffer and reading from the buffer doesn't have to involve the write end of the pipe). A data consumer can call select to know whether data is available, and this doesn't involve the producer at all.

The SSH server can know whether the application running on the server is actively trying to read from its standard input: the SSH server can call select to know whether writing data would block. But if the application tries reading intermittently, the SSH server might not call select at the right time, so it could miss that the application is trying to read data. And the SSH server has no way to know whether or when the application asks there's data available on its standard input by calling select. The only way the SSH server can provide data to the application when it wants is to provide data to the application when it's available.

This requires the client to transmit the data to the server. So the client reads its standard input and forwards data as soon as it's available.

Once the client has read some data from its standard input, it can't un-read it. If the server-side application doesn't end up consuming the data, it's lost.

As a consequence, when you're calling ssh, you need to decide on the client side whether you want standard input to be routed through the SSH connection or not. It's not something the server can tell you.

See also SSH connections running in the background don't exit if multiple connections have been started by the same shell which explores a scenario that's similar, but more complex, involving terminals.

¹ and friends.

  • Re. calling select() to see if a write would block, wouldn't select() be capable of waiting for a change in the "writable" status of the fd, the same way it watches for other fds to become readable? All the SSH server would really need to know here is if the program ever reads. But does select() actually tell if someone is actively calling read() on the fd? Or just that there's an fd open for reading, which might not mean the program is actually calling read() on it? – ilkkachu Jan 26 '22 at 12:49
  • @ilkkachu The job of select is to tell whether read/write would block. Not whether there has been a time in the past when it didn't block. If the application tries reading from stdin for a while, then times out, and then sshd calls select, sshd will not see that it can write without blocking. – Gilles 'SO- stop being evil' Jan 26 '22 at 13:08
  • no, of course not. But often the program (sshd here) using select() would spend most of its time blocking on the select() call, and so would be immediately notified when the fd becomes available. – ilkkachu Jan 26 '22 at 14:07
  • Your second paragraph makes it seem like using select would work when the consumer does blocking reads, but not when it does non-blocking reads. However, even when the consumer only does blocking reads, select wouldn't work at the start, since writing would only block after the pipe buffer has filled. That means that in a typical pipe, before the producer has written anything, select would always return saying that writing wouldn't block, regardless of whether the consumer is trying to slurp, read intermittently, or not read at all. – JoL Jan 26 '22 at 22:17
  • Also, I agree with ilkkachu. If the pipe buffer weren't an issue (e.g. if it were possible to set its size to 0), then intermittent reads shouldn't be a problem, since sshd could via an alternate thread call select and stay blocked, while it starts the client-specified command. select would only return once the command tries to do any sort of read. Again, this is in the hypothetical scenario where the pipe buffer size could be set to 0. Right now, it must be at least equal to the page size. – JoL Jan 26 '22 at 22:53