I had a tough misleading error while connecting in AWS VPC subnets. The error did occur in B->A connection, and did not happen while A->B, so at the beginning I thought that it is library bug.
It happened to be caused by AWS-system "double layer routing, and the NAT instance in the subnet, that did redirect packets over wrong network channel, causing ssh to drop connection.
Below there is a copy of mine post with 'case-study', that was deleted from the original thread:
As far as I can tell this isn't even attempting to answer the question, so I'm deleting it. If you have a separate question feel free to post it as one |@michael-mrozek
In my case:
as @patrick suggested (ssh_exchange_identification: read: Connection reset by peer):
CLIENT (subnetB 172.16.3.76)
ssh 172.16.0.141 -vvv -p23
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to 172.16.0.141 [172.16.0.141] port 23.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2
ssh_exchange_identification: read: Connection reset by peer
SERVER (SubnetA 172.16.0.141)
$(which sshd) -d -p 23
debug1: sshd version OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: key_parse_private2: missing begin marker
debug1: read PEM private key done: type RSA
debug1: private host key: #0 type 1 RSA
debug1: key_parse_private2: missing begin marker
debug1: read PEM private key done: type DSA
debug1: private host key: #1 type 2 DSA
debug1: key_parse_private2: missing begin marker
debug1: read PEM private key done: type ECDSA
debug1: private host key: #2 type 3 ECDSA
debug1: could not open key file '/etc/ssh/ssh_host_ed25519_key': No such file or directory
Could not load host key: /etc/ssh/ssh_host_ed25519_key
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-d'
debug1: rexec_argv[2]='-p'
debug1: rexec_argv[3]='23'
Set /proc/self/oom_score_adj from 0 to -1000
debug1: Bind to port 23 on 0.0.0.0.
Server listening on 0.0.0.0 port 23.
debug1: Bind to port 23 on ::.
Server listening on :: port 23.
debug1: Server will not fork when running in debugging mode.
debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8
debug1: inetd sockets after dupping: 3, 3
debug1: getpeername failed: Transport endpoint is not connected
debug1: get_remote_port failed
The VPC setup and case description:
I do have AWS EC2 Amazon instances running in the VPC (172.16.0.0/16)
- There is public subnetA (172.16.0.0/24), with NAT-instanceA (172.16.0.200) with attached elastic IP
- The other instances in subnetA communicates to the internet via instanceA (default via 172.16.0.200 dev eth0)
- There are instances in subnetB (172.16.3.0/24)
- route table is similar to https://stackoverflow.com/questions/10243833/how-to-connect-to-outside-world-from-amazon-vpc
The problem:
- The hosts both from subnetA and subnetB can ping/communicate.
- The hosts from subnetA can ssh to host in subnetB
- The hosts from subnetB can ssh to instanceA in subnetA
- NONE of the hosts in subnetB can ssh to OTHER instance in subnetA (other than instanceA), there is an error : ssh_exchange_identification: read: Connection reset by peer IF_AND_ONLY_IF the instances in SubnetA HAVE defaut gateway set to NAT-InstanceA (example 'default via 172.16.0.200 dev eth0'). If there is instance_in_subnetA with not_changed default gateway (example 'default via 172.16.0.1 dev eth0'), then You can ssh to that instance from SubnetBhosts
- comment: If there won't be a NAT in subnetA, the instances in subnetA won't have outgoing internet connection
So...
The problem is probably caused by Amazon AWS Router and/or NAT configuration.
For the moment, I guess, that despite the fact, that the VPC routing table is set to:
Destination Target
172.16.0.0/16 local
0.0.0.0/0 igw-nnnnn
The subnetA instances are in
172.16.0.0/24
(edit: source of the problem: routing table redirecting traffic other than 172.16.0.0/24 via NAT instance, overriding AWS-side-Routing: 172.16.0.0/16)
default via 172.16.0.200 dev eth0
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.60
The subnetB instances are in
172.16.3.0/24
When hosts from subnetB connect to instances in subnetA (other than NAT-instanceA), the traffic goes like:
172.16.3.X/24 --> 172.16.3.1 --> 172.16.0.Y
V
??? <-- 172.16.3.200 (NAT)
And that is the problem. I would have to tcpdump
that and verify, it might be fixable via NAT rules, though it is more complex than it should be.
Actually, the rule in AWS router
Destination Target
172.16.0.0/16 local
should in theory cover the VPC/16 subnet, but the instance/24 subnet + NAT gateway hide the functionality on the "system_level".