123

I'm trying to copy a batch of files with scp but it is very slow. This is an example with 10 files:

$ time scp cap_* user@host:~/dir
cap_20151023T113018_704979707.png    100%  413KB 413.2KB/s   00:00    
cap_20151023T113019_999990226.png    100%  413KB 412.6KB/s   00:00    
cap_20151023T113020_649251955.png    100%  417KB 416.8KB/s   00:00    
cap_20151023T113021_284028464.png    100%  417KB 416.8KB/s   00:00    
cap_20151023T113021_927950468.png    100%  413KB 413.0KB/s   00:00    
cap_20151023T113022_567641507.png    100%  413KB 413.1KB/s   00:00    
cap_20151023T113023_203534753.png    100%  414KB 413.5KB/s   00:00    
cap_20151023T113023_855350640.png    100%  412KB 411.7KB/s   00:00    
cap_20151023T113024_496387641.png    100%  412KB 412.3KB/s   00:00    
cap_20151023T113025_138012848.png    100%  414KB 413.8KB/s   00:00    
cap_20151023T113025_778042791.png    100%  413KB 413.4KB/s   00:00    

real    0m43.932s
user    0m0.074s
sys 0m0.030s

The strange thing is that the transfer rate is about 413KB/s and the file size is about 413KB so really it should transfer one file per second, however it's taking about 4.3 seconds per file.

Any idea where this overhead comes from, and is there any way to make it faster?

laurent
  • 1,998
  • 4
    What speed do you expect (i.e., is there another protocol that shows higher transfer speeds between the same two machines)? What happens when you scp a much larger file (perhaps the concatenation of all you 413KB files)? – dhag Oct 23 '15 at 13:47
  • 9
    It looks like the remote system may be trying to resolve the client IP address to a name, and you're having to wait for a timeout before the session proceeds. You could investigate fixing that (e.g. add your IP address to the destination's /etc/hosts file). – wurtel Oct 23 '15 at 14:35
  • 9
    It's worth mentioning that the -C flag enables compression during transfer. Although your problem seems to be overhead starting transfers, compression is basically "free" and almost always helps. – Sam Oct 24 '15 at 02:53
  • 1
    @wurtel: I don't see what you're seeing, all I see are times. There should only be a single reverse DNS call needed anyway. – President James K. Polk Oct 24 '15 at 12:36
  • 1
    Are you relying on SCP for security or only for remote copying? – Freiheit Oct 26 '15 at 13:37
  • Compression wouldn't help in this case; the files are already-compressed PNG's. – Dan Pritts Mar 20 '19 at 19:30
  • Upload speed isn't honest speed because of buffering. The file is so small everything is stored on transmit buffers. I think that messes up the transfer speed. Try downloading from the destination instead of uploading from the source and I believe you'll get (very) different results. – Marcelo Pacheco Feb 24 '22 at 22:07

8 Answers8

122

You could use rsync (over ssh), which uses a single connection to transfer all the source files.

rsync -avP cap_* user@host:dir

If you don't have rsync (and why not!?) you can use tar with ssh like this, which avoids creating a temporary file (these two alternatives are equivalent):

tar czf - cap_* | ssh user@host tar xvzfC - dir
tar cf - cap_* | gzip | ssh user@host 'cd dir && gzip -d | tar xvf -'

The rsync is to be preferred, all other things being equal, because it's restartable in the event of an interruption.

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • 12
    Are you saying a single scp invocation wouldn't use a single connection to transfer all files? – user Oct 23 '15 at 21:11
  • 2
    In the tarpipe case, there's no need for the f - on each side, since tar outputs to/reads from stdout/stdin by default. So tar cz cap_* | ssh user@host tar xvzC dir would do it. – tremby Oct 24 '15 at 01:41
  • 2
    @tremby not necessarily. tar can be compiled with different default values (see tar --show-defaults if you're using GNU tar, or /etc/default/tar otherwise, and in both cases don't forget the TAPE environment variable) – Chris Davies Oct 24 '15 at 11:16
  • 1
    @MichaelKjörling initially I had assumed that scp would create a new connection for each file, but on recollection - and after double-checking with tshark - I realised that I was incorrect. At this point I'm no longer sure why the OP's scp should be taking such a long time per file. – Chris Davies Oct 24 '15 at 11:19
  • @roaima, interesting, thanks. I've never noticed stdin/stdout not being default so far. BSD tar on my Mac at work doesn't mention a TAPE env var in its man page, though GNU tar on my Linux machine does. – tremby Oct 26 '15 at 17:37
  • 3
    This is so much faster its crazy... thanks so much – codenamejames Sep 03 '17 at 22:35
  • This still seems to be the correct answer – Meir Gabay Sep 24 '20 at 09:43
  • you may want to edit, to include a faster version (no v) and which is more portable (no z, no C, so works also with non-gnu tar) : tar cf - cap_* | gzip -c - | ssh user@host 'cd /dest/dir && { gzip -dc - | tar xf - ; } – Olivier Dulac Feb 23 '21 at 08:20
  • you missed the final singlequote (and maybe no need for the v ? it slows down a lot if there are many files to be displayed, especially if run interactively) – Olivier Dulac Feb 23 '21 at 10:11
  • 1
    Thanks. Keeping the -v to match my preferred rsync-based answer that also uses -v – Chris Davies Feb 23 '21 at 11:27
  • 1
    Just adding my experience here. in my setup scp -r was actually around 10x faster than rsync -avP. Ubuntu20 boxes, iperf3 confirmed 9.8Gbps local connection. – taiyodayo Feb 04 '22 at 04:30
  • 1
    @taiyodayo thankyou for that reference point. I've never had access to systems that are sufficiently performant to benefit from such a network speed. Removing -P (or replacing it with --partial) might possibly give a little improvement. Originally rsync was intended to transfer data across slow links by using processor power on both sides, and you may simply have surpassed its niche! – Chris Davies Apr 03 '22 at 06:56
  • For me, the tar-based approach (without compression) seemed to be much faster than rsync, although I didn't thoroughly test. I am copying over local network, origin device on wifi and destination device on ethernet. – abeboparebop May 11 '22 at 05:29
  • Thanks ... after all this time, I never knew SCP was so slow in comparison. I got downloaded in less than a minute 20K files, what would take SCP 11 hours! – mrSidX Aug 21 '23 at 09:54
31

@wurtel's comment is probably correct: there's a lot of overhead establishing each connection. If you can fix that you'll get faster transfers (and if you can't, just use @roaima's rsync workaround). I did an experiment transferring similar-sized files (head -c 417K /dev/urandom > foo.1 and made some copies of that file) to a host that takes a while to connect (HOST4) and one that responds very quickly (HOST1):

$ time ssh $HOST1 echo

real 0m0.146s user 0m0.016s sys 0m0.008s $ time scp * $HOST1: foo.1 100% 417KB 417.0KB/s 00:00
foo.2 100% 417KB 417.0KB/s 00:00
foo.3 100% 417KB 417.0KB/s 00:00
foo.4 100% 417KB 417.0KB/s 00:00
foo.5 100% 417KB 417.0KB/s 00:00

real 0m0.337s user 0m0.032s sys 0m0.016s $ time ssh $HOST4 echo

real 0m1.369s user 0m0.020s sys 0m0.016s $ time scp * $HOST4: foo.1 100% 417KB 417.0KB/s 00:00
foo.2 100% 417KB 417.0KB/s 00:00
foo.3 100% 417KB 417.0KB/s 00:00
foo.4 100% 417KB 417.0KB/s 00:00
foo.5 100% 417KB 417.0KB/s 00:00

real 0m6.489s user 0m0.052s sys 0m0.020s $

Cadoiz
  • 276
  • 1
    Thanks, that's very interesting. The scp output is kind of broken if it shows the same time even though it's completely different from one host to another. They should probably include the connection time in the total time. – laurent Feb 04 '16 at 19:39
  • 4
    So your hypothesis is it makes a new connection once for each file? – rogerdpack Sep 27 '18 at 21:59
20

It's the negotiation of the transfer that takes time. In general, operations on n files of b bytes each takes much, much longer than a single operation on a single file of n * b bytes. This is also true e.g. for disk I/O.

If you look carefully you'll see that the transfer rate in this case is size_of_the_file/secs.

To transfer files more efficiently, bundle them together with tar, then transfer the tarball:

tar cvf myarchive.tar cap_20151023T*.png

or, if you also want to compress the archive,

tar cvzf myarchive.tar.gz myfile*

Whether to compress or not depends on the file contents, eg. if they're JPEGs or PNGs, compression won't have any effect.

dr_
  • 29,602
  • PNGs use deflate, and gzipping them is pointless too. – Mingye Wang Oct 23 '15 at 15:06
  • I'd say that because compressing the tar does not have negative effects when the files can't be compressed further it's a good practice to just put -z – Centimane Oct 23 '15 at 15:49
  • 1
    @Dave if they can't be compressed, or the network is fast, it will slow things down. – Davidmh Oct 23 '15 at 16:51
  • @Davidmh would this be by a significant amount though? I would think compressing an already compressed file would be fairly quick as it would really just look over what it could compress and find that it is nothing. Depends I guess if tar normally does a second pass for compression or if it would be compressing and archiving at the same time – Centimane Oct 23 '15 at 18:05
  • 5
    @Dave in my case (data on a modern 7000 rpm HD, high end CPU, very fast network, not bragging at all), tar without compression is purely IO bound, but with -z is CPU bound, and much slower. gzip will always try to compress, hence the slowdown; after all, you can't tell if a string of bytes is compressible until you have tried to compress it. In my set up, even when transferring plain text files, rsync without compression is the fastest by a factor of 2-3 compared with the lightest compression. Of course, YMMV. – Davidmh Oct 23 '15 at 18:32
  • @Dave this said, I dream of a transferring system that would decide the compression level based on the current CPU and network capabilities and the compressibility of the data. – Davidmh Oct 23 '15 at 18:35
9

I've used the technique described here (archived) which uses parallel gzip and netcat to quickly compress and copy data.

It boils down to:

# SOURCE: 
> tar -cf - /u02/databases/mydb/data_file-1.dbf | pigz | nc -l 8888

TARGET:

> nc <source host> 8888 | pigz -d | tar xf - -C /

This uses tar to gather up the file or files. Then uses pigz to get many cpu threads to compress and send the file, the network transmission is using netcat. On the receiving side, netcat listens then uncompresses (in parallel) and untars.

Freiheit
  • 9,669
8

Another reason that scp is slower than it should be, especially on high bandwidth networks, is that it has statically defined internal flow control buffers which end up becoming network performance bottlenecks.

HPN-SSH is a patched version of OpenSSH which increases the size of these buffers. It makes a massive difference to scp transfer speed (see the charts on the site, but I also speak from personal experience). Of course, to get the benefits you need to install HPN-SSH on all your hosts but it's well worth it if you regularly need to transfer large files around.

7

Just had this issue doing a site-to-site transfer of a large mp4 file via scp. Was getting ~250KB/s. After disabling UDP flood protection (FP) on the destination firewall, the transfer rate increased to 6.5MB/s. When turning FP back on, the rate dropped back to ~250KB/s.

Sender: cygwin, Receiver: Fedora 20, Firewall Sophos UTM.

What does SSH use UDP for? @ superuser.com -- It doesn't directly from what I read.

In reviewing the firewall log, flood detection was occurring on both source & dest ports 4500 over the public IP addresses, not the private site-to-site internal VPN addresses. So it seems my issue is likely a NAT Traversal situation where the scp TCP data is ultimately encrypted and encapsulated in ESP & UDP packets, and consequently subject to FP. To remove scp from the equation, I ran a Windows file copy operation across the VPN and noticed similar performance to scp with and without FP enabled. Also ran an iperf test over TCP and noticed 2Mbits/sec with FP, and 55Mbits/sec without.

How Does NAT-T work with IPSec? @ cisco.com

bvj
  • 181
3

Since this question is not that old and no one else referred to this solution I think it is appropriate since it pushes the bandwidth to the max limit (10MiB/s in my case) unlike scp that is at arount 250kb/s so that answers your question.

Actually the same 250kb/s with rsync - at least with the port specifier rclone -Avvp cap_* -e "ssh -p 1087 -i id_rsa" user@host:~/dir


Quoting a post to the openssh-unix-dev mailing list:

The scp protocol is outdated, inflexible and not readily fixed. Its authors recommend the use of more modern protocols like sftp and rsync for file transfer instead

The same syntax applies to sftp so instead of scp text.txt user@host it is now sftp text.txt user@host(usage examples scp interchangable with sftp)

Also the recent version of OpenSSH should activate the daemon - at least in my case on an arch linux server but you might have to install an sftp package on other distributions.


One more working example with an ssh encryption file flag (id_rsa) and a non standard ssh port 1087 instead of 22 to safe you time fiddeling with the syntax:

sftp -P 1087 -i id_rsa user@server:/home/user/Downloads/Video/*/*.mp4 /home/user/Videos/

Also your sftp might be limited to 800kb/s or ~1 Mbit/s. You can check this with:

# sysctl -a | grep net.*rmem

and you can change the limits e.g. like this if they are too slow:

   # sysctl -w net.ipv4.tcp_rmem='40960 873800 62914560'

   # sysctl -w net.core.rmem_max=8388608
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Ingo Mi
  • 252
  • I found another link to this topic that explains the buffer problem and adds a patch solution: https://www.psc.edu/index.php/hpn-ssh or as dexcribed in here: https://wiki.archlinux.org/index.php/OpenSSH#Speeding_up_SSH – Ingo Mi Dec 28 '19 at 13:01
2

You can make your server into an online website

$ sudo apt-get install apache2
# open port 80 on your server 
# copy your files to the server at /var/www/html/
# if you can't copy or symlink them, use:
$ sudo chown $USER /var/www/html/

and download your files using wget or curl...

$ wget  http://40.86.167.128/video.mp4

For me, I got 100% speed on wifi 4mbit/s, which means my speed is 4mbit/s
and 25% for wifi 100mbit/s, which means 25mbit/s, not 100mbits
but with rsync or scp, I got 1mbit/s for wifi 4mbit/s

nextloop
  • 166
  • I'm un-deleting this as it attempts to answer the question. Whether a second intermediate copy would be worth the HTTP savings isn't clear to me, but it's worth testing. – Jeff Schaller May 21 '22 at 23:56
  • In my case, the file that I wanted to download via scp was already in /var/www/html and being served by a web server (nginx), so I could use curl instead of scp as you suggest and it's much faster. My only concern is that the file (which contains an IP address) should perhaps not be available publicly, so it should be protected somehow, or I should go back to a solution that does not require it to be on a web server... But if the file in question is not a secret, this works very well. – drkvogel Dec 30 '23 at 11:54
  • When we move files to /var/www/html/, they become public by default. You should have experience with nginx to handle private files. I don't know why wget/curl is faster than scp/rsync, but this method could be helpful sometimes. – nextloop Jan 09 '24 at 18:14