ipv4 forwarding breaks bridges and veths

Question

I've successfully gotten the following to work:

ip netns add quarantine
ip link add eth0-q type veth peer name veth-q
ip link add br0 type bridge
ip link set veth-q master br0
ip link set br0 up
ip link set veth-q up
ip link set eth0-q netns quarantine
ip netns exec quarantine ip link set lo up
ip netns exec quarantine ip link set eth0-q up
ip netns exec quarantine ip address add 192.168.66.5/24 dev eth0-q
ip netns exec quarantine dnsmasq --interface=eth0-q --dhcp-range=192.168.66.10,192.168.66.50,255.255.255.0
ip link set eno1 master br0

This allows me to run an instance of dnsmasq without interfering with network-manager, and lets a device connecting through my default ethernet interface (eno1) get an IP in 192.168.66.0/24

I then decided to grant internet access, I did so:

ip address add 192.168.66.1/24 dev br0
iptables -A FORWARD -i wlp58s0 -o br0 -m state  --state ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -o wlp58s0 -o br0 -j ACCEPT
iptables -A FORWARD -j LOG
iptables -t nat -A POSTROUTING -o wlp58s0 -j MASQUERADE
sysctl -w net.ipv4.ipforward=1
sysctl -p

where wlp58s0 is my WiFi interface connected to my home WiFi. I also had to kill the dnsmasq described previously and replace it with:

ip netns exec quarantine dnsmasq --interface=eth0-q \
  --dhcp-range=192.168.66.10,192.168.66.50,255.255.255.0 \
  --dhcp-option=3,192.168.66.1 --dhcp-option=6,8.8.8.8

This way the device connected via eno1 knows to find the gateway and ask DNS queries to the Google DNS server 8.8.8.8.

All of this works perfectly fine, and after rebooting my machine, all the configuration is gone as expected, and things work consistently.

However: in an earlier attempt, I took advice found on the internet to enable packet forwarding, and instead of using sysctl, I did:

echo 1 > /proc/sys/net/ipv4/ip_forward

This had granted internet access after I had already connected my device on eno1 where it already had an IP.

But: after rebooting my machine, that ip forwarding setting had become persistent. Moreover: writing a 0 where I had written a 1 was not persistent. Worse: the initial setup (no internet access, just hand out IPs) was broken, my device on eno1 could not get an IP anymore from the configuration I described in the beginning. I used wireshark: requests for an IP could be seen on br0 but were gone from veth-q, even more peculiar: only IPv6 traffic could be seen on veth-q, the ipv4 traffic was entirely gone. Manually disabling IP forwarding by writing a 0 to /proc/sys/net/ipv4/ip_forward did nothing to help. Eventually I reinstalled my Linux distribution (Ubuntu) and took care of never using that echo command ever again and do things with sysctl which causes no problems.

Why did this happen ? It was a very strange and peculiar behaviour, because everything else with my computer seemed to be working just fine: I could get internet access, everything seemed to be back to normal, but that one interaction between the bridge and veth had been corrupted.

Any light shed on this would be greatly appreciated !

@larsks Ubuntu, I mentioned it in my original post, I should have emphasized it maybe. — Arno, Jun 03 '22 at 14:56
I look at this question and it looks like an awful lot of effort just to run an instance of dnsmasq. I think you could discard the bulk of what you're doing with suitable use of the --bind-interfaces or --bind-dynamic options. I asked what distro you're running because as far as I can tell, the last few versions of Ubuntu don't even run a dnsmasq instance as part of NetworkManager (I've tried 20.04 and 22.04), at least by default. It would help if your question included the information necessary to reproduce the problem you're asking about. — larsks, Jun 06 '22 at 23:23
Oh I see. Then I must have been working with the wrong assumptions. All I understood was that I couldn't run dnsmasq because of something with network-manager or systemd-resolved using port 53 already, I assumed for DNS purposes. I already ran a hotspot and entered DNS entries in /etc/systems/resolv.conf I think it was, and got my hotspot to solve custom entries. Seeing the name "resolv.conf" made me assume it was dnsmasq hiding under the hood. In any case: I needed to hand out IPs and run my own dnsmasq and something was in the way. How would bind-interfaces have worked ? For my knowledge? — Arno, Jun 07 '22 at 08:12
Regarding reproducing my issue: I think the last bit of information I need to give you is that I used Ubuntu 20.04 LTS. If you install it and run verbatim the commands I described, then use that "echo 1" command to enable IP forwarding, you'll have this strange "broken veth" situation. I wanted to know to increase my understanding, regarding my original problem of creating a local network by bridging interfaces together and running dnsmasq, then using NAT to provide internet access: I've achieved my goal. — Arno, Jun 07 '22 at 08:18
I'd be also very happy to hear about a simpler solution with --bind-interface though ! — Arno, Jun 07 '22 at 08:20
Ah. But after searching by myself, this post seems to suggest network namespaces was a good way to go ?... https://unix.stackexchange.com/questions/210982/bind-unix-program-to-specific-network-interface In any case, I'm still interested in suggestions and understanding what went wrong ! — Arno, Jun 07 '22 at 08:25
Hello. Apparently this persistent 1 in /proc/sys/net/ipv4/ip_forward that breaks veth happens easily, I have another machine where I'm certain I never wrote a 1 manually to /proc/sys/net/ipv4/ip_forward yet, I have that symptom on the machine and my veth are broken. I found someone else with a similar problem as mine: https://unix.stackexchange.com/questions/409254/why-is-my-net-ipv4-ip-forward-1
I'd really like some insight on how to fix the veth when this happens actually now, it's not just a "for my knowledge" topic anymore ^^' — Arno, Jun 10 '22 at 09:57

score 0 · Answer 1 · answered Jun 10 '22 at 15:36

0

So, contrary to my initial thoughts, writing a 1 to /proc/sys/net/ipv4/ip_forward was not the problem.

The problem seems to be related to docker.

After disabling docker, I observe more normal behaviour from my bridges and virtual ethernets.

I will write in the comments to this answer whatever I can find out (what more precisely from docker causes the issue), but, at least I can safely say that manually writing a 1 to /proc/sys/net/ipv4/ip_forward was not the problem, which I guess technically solves my question.

answered Jun 10 '22 at 15:36

Arno

1

more information there: https://serverfault.com/questions/963759/docker-breaks-libvirt-bridge-network – A.B Jun 10 '22 at 15:40
1

In the end your problem was not reproducible because you didn't state you were running Docker (and I would never run Docker along other tools dealing with network, including LXC, libvirt etc.). – A.B Jun 10 '22 at 15:44
Indeed ! I'm very sorry about this. I had not suspected docker to cause problems, I have a whole bunch of things running on my machine and I sometimes do not suspect them to interact in such a way ! Thanks for looking into it ! – Arno Jun 11 '22 at 17:36

ipv4 forwarding breaks bridges and veths

1 Answers1