interface MTU different than seen in "ip link show" output

Question

I have two PCs directly connected like this:

PC1[eth1] <-> [eth0]PC2

MTU on eth1 interface in PC1 is 9000 bytes. MTU on eth0 interface in PC2 is 2000 bytes:

root@PC2:~# ip -s link show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2000 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 80:97:41:ae:f7:c9 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    170432     696      0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    118126     274      0       0       0       0
root@PC2:~# ip addr show dev eth0                                                                                                                                                                                          
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2000 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 80:97:41:ae:f7:c9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.30/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.11.12.2/24 scope global eth0:temporary
       valid_lft forever preferred_lft forever
root@PC2:~#

Weird thing is that if I execute ping -M do -s 4182 -c 1 10.11.12.2 in PC1, then this package is received by PC2 and fragmented replies are sent:

root@PC1:~# ping -M do -s 4182 -c 1 10.11.12.2
PING 10.11.12.2 (10.11.12.2) 4182(4210) bytes of data.
4190 bytes from 10.11.12.2: icmp_seq=1 ttl=64 time=0.397 ms

--- 10.11.12.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.397/0.397/0.397/0.000 ms
root@PC1:~#

I would expect PC2 to silently drop this frame as it exceeds its MTU. However, ping -M do -s 4183 -c 1 10.11.12.2 does not receive a reply. This looks like PC2 has an interface MTU of 4210(4182 + 8 ICMP header + 20 IPv4 header) bytes and not 2000 bytes. Kernel driver for eth0 in PC2 is e1000e.

How to explain this behavior?

Be careful, that in a network broadcast domain, it is all or nothing when jumbo frames are involved. You should not have some machines in a normal state, and others with jumbo frames in the same network. MTU can also be negotiated, but for that you cannot block certain messages of ICMP in the firewall. Also normally the best MTU size to reach a server over certain types of communication is around 1470 bytes. — Rui F Ribeiro, Feb 03 '17 at 14:57
In my opinion, this post is more oriented to our bother forum "Network engineering" in Stack Exchange than "Unix&Linux". — Rui F Ribeiro, Feb 03 '17 at 15:03
@RuiFRibeiro As far as I know, "Network engineering" is about professional network equipment. In addition, for example Cisco swiches and routers do not behave like this. They drop the Ethernet frame larger than port MTU and increase the giants counter. So the behavior described seems to be specific to Linux. — Martin, Feb 04 '17 at 15:34
@Martin Here is fine, but I think Network Engineering would also have been fine. — Celada, Feb 05 '17 at 18:16

score 4 · Accepted Answer · answered Feb 05 '17 at 18:15

This is the difference between MTU (Maximum Transmission Unit) and MRU (Maximum Receive Unit).

Normally one expects the MTU (and MRU) to be set the same across all of the members of a single broadcast domain and therefore the difference doesn't matter, but under your misconfigured setup, it does matter.

I would expect PC2 to silently drop this frame as it exceeds its MTU.

You've told PC2 not to exceed 2000-byte packets when transmitting, but that doesn't forbid it from receiving something larger. It's possible the Postel principle is at work here (it depends on how exactly the driver was designed).

However, ping -M do -s 4183 -c 1 10.11.12.2 does not receive a reply.

It sounds like this single extra byte puts it over its MRU. Since you have not configured any MRU explicitly, this size might be a hardware limitation or a result of how the network interface hardware's internal buffers are configured when the MTU is set to 2000.

Configure your MTUs correctly across a single broadcast domain and you won't run into this issue. Some routing protocols, like IS-IS, intentionally pad their Hello messages up to the MTU to make sure that all the other speakers in the broadcast domain can actually receive this packet, thus preventing the adjacency from coming up at all in the case of a misconfiguration, thus allowing you to discover the problem more readily.

Thanks! For some reason I thought that MTU value is always the same as MRU value. By the way, looks like that for example Cisco ASR routers are able to show the MRU value in the output of sh controllers. — Martin, Feb 05 '17 at 20:18
Yeah. Also router vendors may be a little more disciplined about forcing the two to be always equal. Depends on the software & hardware I guess. Contrast this with protocols like PPP where, if you read the RFC, MTU is absolutely never mentioned, only MRU, and both sides can negotiate different MRUs, leading to an asymmetric MTU/MRU. The negotiation protocol is essentially "Here is the largest packet size I am willing to accept from you. Agree or disagree?". In practice most implementations probably force the MRU to be the same in both directions, but I've never checked! — Celada, Feb 05 '17 at 21:12

interface MTU different than seen in "ip link show" output

1 Answers1