How to make this network work in this special case?

Question

I have a problem. All my machines are behind a router which is connected to a ETH port on the modem. That port is too limited in download/upload. So, I tried to connect two cables from the router to the modem on two ports. I've been working on how to solve this a lot, I don't know what to try anymore.

My router has 4 interfaces:

enp1s0f0   172.16.0.3
enp4s0f1   10.0.0.6
enp1s0f1   192.168.0.3
enp4s0f0   192.168.0.6

As you can see, eth3 and eth4 are on the same network, which is odd. It had to be this way if I wanted to connect to the modem (192.168.0.1) using two ETH ports.

So, here is what I have tried:

echo "1   myorg" >> /etc/iproute2/rt_tables #added a custom routing table myorg
sudo ip route add 192.168.0.1 scope link dev enp4s0f0 #don't know if it is really necessary
sudo ip rule add from 192.168.0.6 table myorg
sudo ip route add default via 192.168.0.1 dev enp4s0f0 table myorg #second default gateway through myorg table

I get these routes as result:

$ ip -4 route show table main
default via 192.168.0.1 dev enp1s0f1 onlink 
10.0.0.0/24 dev enp4s0f1 proto kernel scope link src 10.0.0.6 
172.16.0.0/24 dev enp1s0f0 proto kernel scope link src 172.16.0.3 
192.168.0.0/24 dev enp1s0f1 proto kernel scope link src 192.168.0.3 
192.168.0.0/24 dev enp4s0f0 proto kernel scope link src 192.168.0.6
192.168.0.1 dev enp4s0f0 scope link

$ ip -4 route show table myorg
default via 192.168.0.1 dev enp4s0f0

$ sudo route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         192.168.0.1     0.0.0.0         UG    0      0        0 enp1s0f1
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 enp4s0f1
172.16.0.0      0.0.0.0         255.255.255.0   U     0      0        0 enp1s0f0
192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 enp1s0f1
192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 enp4s0f0
192.168.0.1     0.0.0.0         255.255.255.255 UH    0      0        0 enp4s0f0

I use ufw as NAT firewall. I added in *nat section:

:POSTROUTING ACCEPT - [0:0]
-A POSTROUTING -s 172.16.0.0/24 -o enp1s0f1 -j MASQUERADE
-A POSTROUTING -s 10.0.0.0/24 -o enp4s0f0 -j MASQUERADE

The problem is, alternatively, either machines in 10.0.0.0 network receive ping response from the modem (gateway 192.168.0.1), either the machines from 172.16.0.0 network. It can be the other way around depending on the moment, I don't know why.

My modem sees both clients 192.168.0.3 and 192.168.0.6 on two ETH ports.

So, is it possible to have WAN access on all machines and all networks with this topology (router with two interfaces on the same network) ?

No worries, we've all been there. I should have realized you were still editing and our edits were colliding with each other. Sorry! — terdon, Apr 01 '20 at 15:51

score 1 · Accepted Answer · edited Oct 07 '21 at 07:34

While it's not explictly written, I guess the goal is to split traffic such that:

172.16.0.0/24 traffic flows through enp1s0f1
10.0.0.0/24 traffic flows through enp4s0f0

As OP wrote, this needs policy/source-based routing. iptables and netfilter are rarely useful (at least alone):

generally speaking iptables and netfilter don't route and don't care about routes. The network routing stack routes. Some of iptables' actions will still cause routing decision alterations (as described in this schematic)
any action done in POSTROUTING, as the name tells, happens after routing decisions were made: it's too late to alter the route. Here while the nat/POSTROUTING rule are needed, they won't alter the route.

Whenever iptables can be avoided to solve a routing problem, better avoid it. Sometimes it can't be avoided (and then usually iptables is used to add a mark to packets and this mark is used in an ip rule entry).

Routes

I will assume that rp_filter=1 is set on all interfaces, since it's the default for most distributions, to enable Strict Reverse Path Forwarding.

Source address is selected by rule, destination by routing table. The additional routing tables should have enough informations to override (without ambiguity) routes when only one among multiple should be chosen (then only this one is added to the table). Often additional routes from the main table must also be copied or bad things can happen.

In my answer I will give no preference over one network or an other: each will get its own routing table. I'll forget table 1 and use tables 10 for LAN 10.0.0.0/24 and 172 for LAN 172.16.0.0/24. Keep the NAT rules, remove the rules and additional routing tables, as well as 192.168.0.1 dev enp4s0f0 scope link from main.

Routes for 10.0.0.0/24 <--> 10.0.0.6 enp4s0f0 | enp4s0f1 192.168.0.6 <--> 192.168.0.1/default:

 ip rule add from 10.0.0.0/24 lookup 10
 ip route add table 10 10.0.0.0/24 dev enp4s0f1
 ip route add table 10 192.168.0.0/24 dev enp4s0f0 src 192.168.0.6
 ip route add table 10 default via 192.168.0.1

Above, without also the duplicated route entry for 10.0.0.0/24, the system wouldn't be able itself to access this LAN: it would resolve the route as having to go through the default gateway, only for Strict Reverse Path Forwarding(SRPF) purposes making this difficult to debug. That's an example of bad thing if not added. When in doubt, just duplicate routes.

An other equivalent option could have been instead of the additional route to change the rule above into:

    ip rule add from 10.0.0.0/24 iif enp4s0f1 lookup 10

so it wouldn't have matched for local (non-routed) traffic and only the main table would be used.

Routes for 172.16.0.0/24 <--> 172.16.0.3 enp1s0f0 | enp1s0f1 192.168.0.3 <--> 192.168.0.1/default:

 ip rule add from 172.16.0.0/24 lookup 172
 ip route add table 172 172.16.0.0/24 dev enp1s0f0
 ip route add table 172 192.168.0.0/24 dev enp1s0f1 src 192.168.0.3
 ip route add table 172 default via 192.168.0.1

To also alter the route (the link) for locally initiated outgoing traffic when changing the outgoing source IP address on the Linux system. This should be optional, but next part about ARP flux makes it mandatory:
```
 ip rule add from 192.168.0.6 lookup 10
 ip rule add from 192.168.0.3 lookup 172
```
Any non-special case involving the overriden routes from the rules must also be duplicated

Here the only missing routes are between the two special LANs themselves:

in table 10 to reach 172.16.0.0/24
in table 172 to reach 10.0.0.0/24

because each additional table doesn't yet have a route for this other side, it would use the default route (but would be blocked yet again by SRPF) preventing each of the two special networks to communicate anymore between each other. So just duplicate the missing route for each table:

    ip route add table 10 172.16.0.0/24 dev enp1s0f0
    ip route add table 172 10.0.0.0/24 dev enp4s0f1

With this model, if for example two other "normal" internal networks were to be added, they could communicate between themselves (and would use the main table's default route to go outside) without extra setting, but would again require duplication of their routes in each additional routing table to communicate with the two special LANs.

Routes are now fine, but there's still...

The ARP flux problem

Linux follows the weak host model. That's the case for IP routing, and likewise for the way Linux answers ARP requests: from any interface for any IP, but of course using the interface's own MAC address. As this can happen on all interfaces simultaneously when multiple interfaces are on the same LAN, usually fastest wins. Then the ARP information is cached on the remote system and will stay there for some time. Eventually cache expires, the same happens, with a possible different outcome. So how does this cause a problem? Here's an example:

Router (modem) sends an ARP request for 192.168.0.6 to send back routed and NATed (by Linux) reply to traffic initially sent from 10.0.0.0/24.
Linux replies on enp1s0f1 (enp1s0f1 won the race) using enp1s0f1's MAC address in reply to tell it has 192.168.0.6.
For a few seconds to a few minutes, future ingress IP packets from Router for 192.168.0.6 arrive on enp1s0f1,
at the same time egress packets from 192.168.0.6 leave using enp4s0f0.

This asymmetric routing is caught by Strict Reverse Path Forwarding (rp_filter) and the traffic will fail. This can even appear to work randomly for a few seconds then fail again. Depending on overall traffic the problem could even later switch to the other link (and then the problems switch to the other LAN).

Luckily to prevent this, Linux provides a setting, to be used only together with policy routing, to have ARP follow the same rules defined by routing: arp_filter.

arp_filter - BOOLEAN

1 - Allows you to have multiple network interfaces on the same subnet, and have the ARPs for each interface be answered based on whether or not the kernel would route a packet from the ARP'd IP out that interface (therefore you must use source based routing for this to work). In other words it allows control of which cards (usually 1) will respond to an arp request.

sysctl -w net.ipv4.conf.enp4s0f0.arp_filter=1
sysctl -w net.ipv4.conf.enp1s0f1.arp_filter=1

Now the ARP behaviour is correct, if the settings were just been put in place, one should force-flush the ARP cache of peers (here: the modem) by doing a duplicate address detection with arping (from iputils / iputils-arping) which will broadcast to peers and have them update their cache:

arping -c 5 -I enp4s0f0 -D -s 192.168.0.6 192.168.0.6 &
arping -c 5 -I enp1s0f1 -D -s 192.168.0.3 192.168.0.3

Note that the two rules in bullet 3. in the previous part are now mandatory, because the IP addresses 192.168.0.3 and 192.168.0.6 must match in the policy routing rules for correct ARP resolution with arp_filter=1.

How to debug

ip route get is very useful to check routes and reverse path filtering:

new test case for bullet 4. above:

# ip route get from 10.0.0.111 iif enp4s0f0 172.16.0.111
172.16.0.111 from 10.0.0.111 dev enp1s0f0 table 10 
    cache iif enp4s0f0 
# ip route get from 172.16.0.111 iif enp1s0f0 to 10.0.0.111
10.0.0.111 from 172.16.0.111 dev enp4s0f1 table 172 
    cache iif enp1s0f0

when deleting rules or routes:

# ip route get from 10.0.0.111 iif enp4s0f1 8.8.8.8
8.8.8.8 from 10.0.0.111 via 192.168.0.1 dev enp4s0f0 table 10 
    cache iif enp4s0f1 
# ip rule del from 10.0.0.0/24 lookup 10
# ip route get from 10.0.0.111 iif enp4s0f1 8.8.8.8
8.8.8.8 from 10.0.0.111 via 192.168.0.1 dev enp1s0f1
    cache iif enp4s0f1
# ip route get from 192.168.0.1 iif enp4s0f0 192.168.0.6
local 192.168.0.6 from 192.168.0.1 dev lo table local
    cache <local> iif enp4s0f0
# ip rule delete from 192.168.0.6 lookup 10
# ip route get from 192.168.0.1 iif enp4s0f0 192.168.0.6
RTNETLINK answers: Invalid cross-device link

This shows how results are altered depending on (lack of) rules and additional routes. The last result is the error message that tells Reverse Path Forwarding check failed (=> drop).

Then there are ip neigh (most useful on peer systems) to check ARP entries, tcpdump, etc.

thank you very much A.B for your very detailed answer and care. At first, I didn't understand much of it, as a second thought I could understand more. By default my debian kernel has reverse path filtering=0, I activated it via sysctl on all interfaces, if it makes sense. Then I executed the routes and ARP commands. Now, I have WAN access on both subnets 10 and 172, that's very good. I have one problem left, maybe it wasn't clear enough: machines on net 10 and 172 should be able to communicate. By default it was the case before, now ping does not return — sugarman, Apr 06 '20 at 19:47
that makes plenty of sense, thank you. I understand even more now. In your opinion, how should I make the routes/rules persistent after reboot? I can do this with a rc.local script. Is there a more network recommended way? — sugarman, Apr 07 '20 at 09:14
Your case involves 4 interfaces. As soon as one interface is set down, you lose all of its routes in all of the tables. So you need a script that is idempotent, eg: not add a rule a 2nd time, the ip rule command won't prevent it, but at the same time it has to cope with routes already set etc. This script must run after the 4 interfaces are set up in the normal way, so right after the network target or equivalent — A.B, Apr 07 '20 at 09:21
Other method. For simplier cases, this could have used specific up or pre-up entries in /etc/network/interfaces.d/ to be applied right after the interface is brought up. Yet, if the script is idempotent, you can put this same script everywhere for every interface (and ensure that if it fails it's ignored by adding a || true at the end of the up entry): normally the 4th interface that goes up and calls this script should have it have added all the correct settings. I hope you see what I mean — A.B, Apr 07 '20 at 11:42
I added the routes/rules commands to /etc/rc.local, it is executed at the end of multiuser level. It works fine. — sugarman, Apr 07 '20 at 17:13
One remark about the sysctl arp_filter or rp_filter properties setting. It should be added to /etc/sysctl.conf to make it persistent after reboot. *** Thank you a lot A.B, you rock :-) *** — sugarman, Apr 07 '20 at 17:16
you're welcome. If it works, just mark my answer as solution heh. As a side note, rp_filter isn't mandatory. Having rp_filter helps against spoofing, but at the same time makes everything more difficult. Anyway, when settings work with rp_filter in strict mode, that means the settings are really correct. There are some corner cases (most often involving iptables and local traffic rather than routed traffic) where it has to be relaxed. — A.B, Apr 07 '20 at 18:42

How to make this network work in this special case?

1 Answers1

Routes

The ARP flux problem

How to debug

Linked