38

I am trying to copy files over SSH, but cannot use scp due to not knowing the exact filename that I need. Although small binary files and text files transfer fine, large binary files get altered. Here is the file on the server:

remote$ ls -la
-rw-rw-r--  1 user user 244970907 Aug 24 11:11 foo.gz
remote$ md5sum foo.gz 
9b5a44dad9d129bab52cbc6d806e7fda foo.gz

Here is the file after I've moved it over:

local$ time ssh me@server.com -t 'cat /path/to/foo.gz' > latest.gz

real    1m52.098s
user    0m2.608s
sys     0m4.370s
local$ md5sum latest.gz
76fae9d6a4711bad1560092b539d034b  latest.gz

local$ ls -la
-rw-rw-r--  1 dotancohen dotancohen 245849912 Aug 24 18:26 latest.gz

Note that the downloaded file is bigger than the one on the server! However, if I do the same with a very small file, then everything works as expected:

remote$ echo "Hello" | gzip -c > hello.txt.gz
remote$ md5sum hello.txt.gz
08bf5080733d46a47d339520176b9211  hello.txt.gz

local$ time ssh me@server.com -t 'cat /path/to/hello.txt.gz' > hi.txt.gz

real 0m3.041s user 0m0.013s sys 0m0.005s

local$ md5sum hi.txt.gz
08bf5080733d46a47d339520176b9211  hi.txt.gz

Both file sizes are 26 bytes in this case.

Why might small files transfer fine, but large files get some bytes added to them?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
dotancohen
  • 15,864
  • 12
    It's the -t option, which breaks the transfer. Don't use -t or -T, unless you need them for a very specific reason. The default works in the vast majority of cases, so those options are very rarely needed. – kasperd Aug 24 '14 at 21:15
  • 3
    Never thought I'd say this in this century, but you may want to try uuencode and uudecode if ssh -t cat is the only way to transfer files. – Mark Plotnick Aug 25 '14 at 02:49
  • 1
    @MarkPlotnick modern version of uuencode/uudecode are now named base64/base64 -d – Archemar Aug 06 '15 at 08:55

2 Answers2

77

TL;DR

Don't use -t. -t involves a pseudo-terminal on the remote host and should only be used to run visual applications from a terminal.

Explanation

The linefeed character (also known as newline or \n) is the one that when sent to a terminal tells the terminal to move its cursor down.

Yet, when you run seq 3 in a terminal, that is where seq writes 1\n2\n3\n to something like /dev/pts/0, you don't see:

1
 2
  3

but

1
2
3

Why is that?

Actually, when seq 3 (or ssh host seq 3 for that matters) writes 1\n2\n3\n, the terminal sees 1\r\n2\r\n3\r\n. That is, the line-feeds have been translated to carriage-return (upon which terminals move their cursor back to the left of the screen) and line-feed.

That is done by the terminal device driver. More exactly, by the line-discipline of the terminal (or pseudo-terminal) device, a software module that resides in the kernel.

You can control the behaviour of that line discipline with the stty command. The translation of LF -> CRLF is turned on with

stty onlcr

(which is generally enabled by default). You can turn it off with:

stty -onlcr

Or you can turn all output processing off with:

stty -opost

If you do that and run seq 3, you'll then see:

$ stty -onlcr; seq 3
1
 2
  3

as expected.

Now, when you do:

seq 3 > some-file

seq is no longer writing to a terminal device, it's writing into a regular file, there's no translation being done. So some-file does contain 1\n2\n3\n. The translation is only done when writing to a terminal device. And it's only done for display.

similarly, when you do:

ssh host seq 3

ssh is writing 1\n2\n3\n regardless of what ssh's output goes to.

What actually happens is that the seq 3 command is run on host with its stdout redirected to a pipe. The ssh server on host reads the other end of the pipe and sends it over the encrypted channel to your ssh client and the ssh client writes it onto its stdout, in your case a pseudo-terminal device, where LFs are translated to CRLF for display.

Many interactive applications behave differently when their stdout is not a terminal. For instance, if you run:

ssh host vi

vi doesn't like it, it doesn't like its output going to a pipe. It thinks it's not talking to a device that is able to understand cursor positioning escape sequences for instance.

So ssh has the -t option for that. With that option, the ssh server on host creates a pseudo-terminal device and makes that the stdout (and stdin, and stderr) of vi. What vi writes on that terminal device goes through that remote pseudo-terminal line discipline and is read by the ssh server and sent over the encrypted channel to the ssh client. It's the same as before except that instead of using a pipe, the ssh server uses a pseudo-terminal.

The other difference is that on the client side, the ssh client sets the terminal in raw mode (and disables local echo). That means that no translation is done there (opost is disabled and also other input-side behaviours). For instance, when you type Ctrl-C, instead of interrupting ssh, that ^C character is sent to the remote side, where the line discipline of the remote pseudo-terminal sends the interrupt to the remote command.

When you do:

ssh -t host seq 3

seq 3 writes 1\n2\n3\n to its stdout, which is a pseudo-terminal device. Because of onlcr, that gets translated on host to 1\r\n2\r\n3\r\n and sent to you over the encrypted channel. On your side there is no translation (onlcr disabled), so 1\r\n2\r\n3\r\n is displayed untouched (because of the raw mode) and correctly on the screen of your terminal emulator.

Now, if you do:

ssh -t host seq 3 > some-file

There's no difference from above. ssh will write the same thing: 1\r\n2\r\n3\r\n, but this time into some-file.

So basically all the LF in the output of seq have been translated to CRLF into some-file.

It's the same if you do:

ssh -t host cat remote-file > local-file

All the LF characters (0x0a bytes) are being translated into CRLF (0x0d 0x0a).

That's probably the reason for the corruption in your file. In the case of the second smaller file, it just so happens that the file doesn't contain 0x0a bytes, so there is no corruption.

Note that you could get different types of corruption with different tty settings. Another potential type of corruption associated with -t is if your startup files on host (~/.bashrc, ~/.ssh/rc...) write things to their stderr, because with -t the stdout and stderr of the remote shell end up being merged into ssh's stdout (they both go to the pseudo-terminal device).

You don't want the remote cat to output to a terminal device there.

You want:

ssh host cat remote-file > local-file

You could do:

ssh -t host 'stty -opost; cat remote-file' > local-file

That would work (except in the writing to stderr corruption case discussed above), but even that would be sub-optimal as you'd have that unnecessary pseudo-terminal layer running on host.


Some more fun:

$ ssh localhost echo | od -tx1
0000000 0a
0000001

OK.

$ ssh -t localhost echo | od -tx1
0000000 0d 0a
0000002

LF translated to CRLF

$ ssh -t localhost 'stty -opost; echo' | od -tx1
0000000 0a
0000001

OK again.

$ ssh -t localhost 'stty olcuc; echo x'
X

That's another form of output post-processing that can be done by the terminal line discipline.

$ echo x | ssh -t localhost 'stty -opost; echo' | od -tx1
Pseudo-terminal will not be allocated because stdin is not a terminal.
stty: standard input: Inappropriate ioctl for device
0000000 0a
0000001

ssh refuses to tell the server to use a pseudo-terminal when its own input is not a terminal. You can force it with -tt though:

$ echo x | ssh -tt localhost 'stty -opost; echo' | od -tx1
0000000   x  \r  \n  \n
0000004

The line discipline does a lot more on the input side.

Here, echo doesn't read its input nor was asked to output that x\r\n\n so where does that come from? That's the local echo of the remote pseudo-terminal (stty echo). The ssh server is feeding the x\n it read from the client to the master side of the remote pseudo-terminal. And the line discipline of that echoes it back (before stty opost is run which is why we see a CRLF and not LF). That's independent from whether the remote application reads anything from stdin or not.

$ (sleep 1; printf '\03') | ssh -tt localhost 'trap "echo ouch" INT; sleep 2'
^Couch

The 0x3 character is echoed back as ^C (^ and C) because of stty echoctl and the shell and sleep receive a SIGINT because stty isig.

So while:

ssh -t host cat remote-file > local-file

is bad enough, but

ssh -tt host 'cat > remote-file' < local-file

to transfer files the other way across is a lot worse. You'll get some CR -> LF translation, but also problems with all the special characters (^C, ^Z, ^D, ^?, ^S...) and also the remote cat will not see eof when the end of local-file is reached, only when ^D is sent after a \r, \n or another ^D like when doing cat > file in your terminal.

  • When one can not avoid -t (like when needing remote sudo), something like this worked for me (after reading this answer and understanding the problem): stty raw; ssh -t remote.example.com sudo sh -c 'stty raw > /dev/null; cat /path/to/binary/data' > copy_of_binary_data; stty cooked – starfry Jan 07 '20 at 17:00
  • @starfry, ssh -t remote.example.com sudo sh -c 'stty raw > /dev/null; cat /path/to/binary/data' doesn't do what you thing it does. ssh concatenates its argument and passes the result as a command line to be interpreted by the remote shell. So the remote shell will run sudo sh -c stty raw > /dev/null; cat /path/to/binary/data. You also don't need to touch the local terminal settings. cooked does not cancel all of raw, you'd probably want sane instead. That should be more something like ssh -r remote 'stty -opost; sudo cat /path/to/binary/data'. You shouldn't need to have a tty her – Stéphane Chazelas Jan 07 '20 at 17:32
  • Yes that's how I understood ssh command line to work: make the remote raw and then send the data. I had to use -t because of the remote sudo configuration, annoying but something that I am not permitted to change. I found that the local file was consequently corrupted unless I did the local stty. I verified this using a hash MD5. But, yes, stty -opost is enough but I needed it locally also (and undo it with stty opost afterwards). Also -r isn't a valid ssh option... – starfry Jan 08 '20 at 12:06
  • @starfry "I had to use -t because of the remote sudo configuration" -- See this: ssh with separate stdin, stdout, stderr AND tty. – Kamil Maciorowski Feb 04 '22 at 09:17
  • @KamilMaciorowski, you should fix your sudo configuration. See Why do I need a tty to run sudo if I can sudo without a password? – Stéphane Chazelas Feb 04 '22 at 09:49
  • @StéphaneChazelas My sudo configuration is exactly what I want it to be; I want my sudo to ask for password and I want it to do this via tty. The user I was talking to had written about their configuration: "annoying but something that I am not permitted to change". – Kamil Maciorowski Feb 04 '22 at 10:25
6

When using that method to copy the file the files appear to be different.

Remote server

ls -l | grep vim_cfg
-rw-rw-r--.  1 slm slm 9783257 Aug  5 16:51 vim_cfg.tgz

Local server

Running your ssh ... cat command:

$ ssh dufresne -t 'cat ~/vim_cfg.tgz' > vim_cfg.tgz

Results in this file on the local server:

$ ls -l | grep vim_cfg.tgz 
-rw-rw-r--. 1 saml saml 9820481 Aug 24 12:13 vim_cfg.tgz

Investigating why?

Investigating the resulting file on the local side shows that it's been corrupted. If you take the -t switch out of your ssh command then it works as expected.

$ ssh dufresne 'cat ~/vim_cfg.tgz' > vim_cfg.tgz

$ ls -l | grep vim_cfg.tgz
-rw-rw-r--. 1 saml saml 9783257 Aug 24 12:17 vim_cfg.tgz

Checksums now work too:

# remote server
$ ssh dufresne "md5sum ~/vim_cfg.tgz"
9e70b036836dfdf2871e76b3636a72c6  /home/slm/vim_cfg.tgz

# local server
$ md5sum vim_cfg.tgz 
9e70b036836dfdf2871e76b3636a72c6  vim_cfg.tgz
slm
  • 369,824
  • Thank you Sim. Though in fact you were the first to post the correct answer, I did select Stéphane for the chosen answer due to the depth of his explanation. Not to worry, you've got a long post history that I am learning from, and of course I upvote those posts that I learn from. Thank you. – dotancohen Aug 25 '14 at 08:20
  • @dotancohen - no worries, you accept which ever A's you feel are the ones that help you as the OP the most 8-). His abilities to explain why things happens is unrivaled, except by Gilles. – slm Aug 25 '14 at 12:28