4

I am in a case where the encryption process in an SFTP transfer maxes out one CPU core. However my IO bandwidth (disks, buses and network) is far from being maxed out.

That being said, the system in question has multiple cores: I would like to leverage them in the encryption/decryption process.

Is that possible? If so, how?

NB: if at all possible, I would like to avoid patch sets with modification not deemed good enough to be included in upstream OpenSSH.

countermode
  • 7,533
  • 5
  • 31
  • 58
JohnW
  • 318
  • What kind of CPU do you have? Does it have AES-NI extentions? If so, you might be able to use a cypher with hardware accelerations, i.e. AES. – Martin Ueding Oct 01 '16 at 13:50
  • I really have no idea about this case, but as I'm sure you're aware, not everything is subject to improvement by multi-threading, some things have to be done one step at a time. Somewhere here there's a great Q&A about how sftp/scp is just much slower by design than sshfs, and although I could not find that one I did find this one, which may also be worth reading: http://unix.stackexchange.com/q/238152/25985 – goldilocks Oct 01 '16 at 15:10
  • @MartinUeding: nope, it is an ARM board. I've extensively benchmarked every cipher availables and just spent an hour repeating all of that with different crypto kernel modules. – JohnW Oct 01 '16 at 15:59
  • Regarding how I know that I should get more out of multithreading, it is pretty easy: the CPU worked at 100% during all the transfer and if I try to transfer another file at the same time using another user, therefore another fork of openssh, I get nearly twice the speed. This would have been nice if I had to transfer multiple files at once, but unfortunately this is not the case. – JohnW Oct 01 '16 at 16:02
  • Perhaps it is possible to encrypt a file in chunks such that OpenSSH could speed things up. Another idea: Is it possible that you split your file up, transfer it with multiple processes and then reassemble it on the other side? – Martin Ueding Oct 01 '16 at 16:18
  • Yes, that's what I am doing right know using a quick and dirty bash wrapper. rsync may allow this in one way or the other. By the way, openssh seems to fork when I do multiple transfers even without using another account. That will do then. :) – JohnW Oct 01 '16 at 16:25

1 Answers1

5

No. The SFTP protocol doesn't leave many opportunities for parallelization. The original protocol requires cipher and MAC algorithms that can't be parallelized within a packet. OpenSSH supports GCM, which can be parallelized, but OpenSSH doesn't try to parallelize inside a packet. Although the protocol allows parallelizing the processing of successive packets, OpenSSH doesn't do that.

Why doesn't OpenSSH parallelize? Because parallelization is complicated to do right, and is only beneficial for performance in specific scenarios:

  • In most scenarios, the network is the bottleneck, so optimizing for CPU time is pointless.
  • If the system is doing other things (including serving multiple SSH connections in parallel), then parellizing the SSH processing is detrimental to the performance of other processes.
  • Parallelizing has a cost: the workload has to be transmitted to the participating processors, and the data must be assembled when all the processors have finished. Synchronization has a pretty heavy cost, so parallelization is only beneficial if each work item is sufficiently large. For SSH, parallelization within a packet is unlikely to be beneficial.
  • Parallelizing the processing of multiple packets would be possible, but it would have a huge impact on the design of the software: there would have to be a complex interface between the data layer and the cryptography layer, instead of simple data streaming.

OpenSSH is designed with security in mind, and complexity is the enemy of security, so it would be very much out of character to even consider parallelization. Someone else did, though: HPN-SSH is a set of patches for OpenSSH that allow parallel processing. It's still maintained as of today.

ARMv8 introduces hardware acceleration for AES, SHA-1 and SHA-256. If you have an ARMv8 board (whether you're running a 32-bit or 64-bit system), make sure that your crypto library (OpenSSL for OpenSSH) is compiled with ARMv8 acceleration. Some pre-ARMv8 have proprietary crypto acceleration which may be exposed by the Linux kernel, but OpenSSL doesn't support this out of the box (there have been kernel and OpenSSL patches but they have a history of falling out of maintenance).

If you don't want to use the HPN patches, then you can parallelize above the SSH layer. If you have many small files to transfer, copy them in batches and parallelize the batches. If you have a large file to transfer, copy it in chunks and parallelize the chunks.