0

I had a tough misleading error while connecting in AWS VPC subnets. The error did occur in B->A connection, and did not happen while A->B, so at the beginning I thought that it is library bug.

It happened to be caused by AWS-system "double layer routing, and the NAT instance in the subnet, that did redirect packets over wrong network channel, causing ssh to drop connection.

Below there is a copy of mine post with 'case-study', that was deleted from the original thread:

As far as I can tell this isn't even attempting to answer the question, so I'm deleting it. If you have a separate question feel free to post it as one |@michael-mrozek

In my case:

as @patrick suggested (ssh_exchange_identification: read: Connection reset by peer):

CLIENT (subnetB 172.16.3.76)

ssh 172.16.0.141 -vvv -p23
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /etc/ssh/ssh_config 
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to 172.16.0.141 [172.16.0.141] port 23.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2
ssh_exchange_identification: read: Connection reset by peer

SERVER (SubnetA 172.16.0.141)

$(which sshd) -d -p 23
debug1: sshd version OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: key_parse_private2: missing begin marker
debug1: read PEM private key done: type RSA
debug1: private host key: #0 type 1 RSA
debug1: key_parse_private2: missing begin marker
debug1: read PEM private key done: type DSA
debug1: private host key: #1 type 2 DSA
debug1: key_parse_private2: missing begin marker
debug1: read PEM private key done: type ECDSA
debug1: private host key: #2 type 3 ECDSA
debug1: could not open key file '/etc/ssh/ssh_host_ed25519_key': No such file or directory
Could not load host key: /etc/ssh/ssh_host_ed25519_key
debug1: rexec_argv[0]='/usr/sbin/sshd' 
debug1: rexec_argv[1]='-d'
debug1: rexec_argv[2]='-p'
debug1: rexec_argv[3]='23'
Set /proc/self/oom_score_adj from 0 to -1000
debug1: Bind to port 23 on 0.0.0.0.
Server listening on 0.0.0.0 port 23.   
debug1: Bind to port 23 on ::.
Server listening on :: port 23.
debug1: Server will not fork when running in debugging mode.
debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8
debug1: inetd sockets after dupping: 3, 3
debug1: getpeername failed: Transport endpoint is not connected
debug1: get_remote_port failed

https://superuser.com/questions/856989/ssh-error-ssh-exchange-identification-read-connection-reset-by-peer


The VPC setup and case description:

I do have AWS EC2 Amazon instances running in the VPC (172.16.0.0/16)

The problem:

  • The hosts both from subnetA and subnetB can ping/communicate.
  • The hosts from subnetA can ssh to host in subnetB
  • The hosts from subnetB can ssh to instanceA in subnetA
  • NONE of the hosts in subnetB can ssh to OTHER instance in subnetA (other than instanceA), there is an error : ssh_exchange_identification: read: Connection reset by peer IF_AND_ONLY_IF the instances in SubnetA HAVE defaut gateway set to NAT-InstanceA (example 'default via 172.16.0.200 dev eth0'). If there is instance_in_subnetA with not_changed default gateway (example 'default via 172.16.0.1 dev eth0'), then You can ssh to that instance from SubnetBhosts
  • comment: If there won't be a NAT in subnetA, the instances in subnetA won't have outgoing internet connection

So...

The problem is probably caused by Amazon AWS Router and/or NAT configuration.

For the moment, I guess, that despite the fact, that the VPC routing table is set to:

Destination Target 
172.16.0.0/16 local
0.0.0.0/0   igw-nnnnn

The subnetA instances are in

172.16.0.0/24

(edit: source of the problem: routing table redirecting traffic other than 172.16.0.0/24 via NAT instance, overriding AWS-side-Routing: 172.16.0.0/16)

default via 172.16.0.200 dev eth0 
172.16.0.0/24 dev eth0  proto kernel  scope link  src 172.16.0.60

The subnetB instances are in

172.16.3.0/24

When hosts from subnetB connect to instances in subnetA (other than NAT-instanceA), the traffic goes like:

172.16.3.X/24  --> 172.16.3.1 --> 172.16.0.Y  
                                      V
                        ???   <-- 172.16.3.200 (NAT) 

And that is the problem. I would have to tcpdump that and verify, it might be fixable via NAT rules, though it is more complex than it should be.

Actually, the rule in AWS router

Destination Target 
172.16.0.0/16 local

should in theory cover the VPC/16 subnet, but the instance/24 subnet + NAT gateway hide the functionality on the "system_level".

sirkubax
  • 141
  • Is there a question in there somewhere? I'm sorry but I can't see it if there is. The beginning looks like an answer, and then there's some problem statements in the middle - are those what you're trying to get resolved here? – Chris Davies Jul 10 '15 at 13:42

1 Answers1

0

On the instances in subnetA (with NAT-instance 172.16.0.200), the routing table looks like:

default via 172.16.0.200 dev eth0
172.16.0.0/24 dev eth0  proto kernel  scope link  src 172.16.0.141

actually, the one addition:

$ ip r a 172.16.3.0/24 via 172.16.0.1
(or ip r a 172.16.3.0/16 via 172.16.0.1)

Fixes the system routing table:

default via 172.16.0.200 dev eth0
172.16.0.0/24 dev eth0  proto kernel  scope link  src 172.16.0.141
172.16.3.0/24 via 172.16.0.1 dev eth0

and shifts the VPC subnets routing over to AWS routers

Destination Target 
172.16.0.0/16 local
0.0.0.0/0   igw-nnnnn
slm
  • 369,824
sirkubax
  • 141