PPPoE and MTU
Here is a diagram showing header and payload sizes for various types of packets. This is helpful when you need to calculate or verify MTU and MSS settings.
The ping payload sizes are what you use as the -s
argument to ping.
Note that this size is not the same as the MTU or the MSS.
And here is a chart with some common values.
Link | PPPoE MTU | Ethernet MTU | ping -s | TCP MSS |
---|---|---|---|---|
normal ethernet LAN IPv4 | n/a | 1500 | 1472 | 1460 |
normal ethernet LAN IPv6 | n/a | 1500 | 1452 | 1440 |
PPPoE without mini jumbos IPv4 | 1492 | 1500 | 1464 | 1452 |
PPPoE without mini jumbos IPv6 | 1492 | 1500 | 1444 | 1432 |
PPPoE with mini jumbos IPv4 | 1500 | 1508 | 1472 | 1460 |
PPPoE with mini jumbos IPv6 | 1500 | 1508 | 1452 | 1440 |
“mini jumbos” refers to support for slightly larger frame sizes described in RFC 4638. This isn’t universally supported, but my ISP (Zen Internet) supports it on its FTTC service.
You can see in the chart that when frame sizes are bumped to a point where the PPPoE payload is the same as a normal ethernet frame, all the other sizes match. Your PPPoE interface then looks like a normal ethernet interface, and you don’t need to make any MTU or MSS adjustments.
Verifying Link MTU with IPv4
Use ping to verify your link MTU is what you calculate.
First ping a host on the other side of your link using the largest ping payload size for your MTU. Here is an example over a link without jumbo frames:
# ping -c1 -D -s 1464 www.freebsd.org
PING wfe0.nyi.freebsd.org (96.47.72.84): 1464 data bytes
1472 bytes from 96.47.72.84: icmp_seq=0 ttl=53 time=74.408 ms
--- wfe0.nyi.freebsd.org ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 74.408/74.408/74.408/0.000 ms
Then to verify, ping with one byte larger. Here is what that looks like from the host itself:
# ping -c1 -D -s 1465 www.freebsd.org
PING wfe0.nyi.freebsd.org (96.47.72.84): 1465 data bytes
ping: sendto: Message too long
^C
--- wfe0.nyi.freebsd.org ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss
If you ping from another host, you get a different response, but the meaning is the same:
$ ping -c1 -D -s 1465 www.freebsd.org
PING wfe0.nyi.freebsd.org (96.47.72.84): 1465 data bytes
36 bytes from my-router (203.0.113.0): frag needed and DF set (MTU 1492)
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 d505 5f12 0 0000 3f 01 7694 192.168.1.1 96.47.72.84
^C
--- wfe0.nyi.freebsd.org ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss
In both cases, the ping fails because we have told ping not to allow fragmentation (-D
).
We instead get back a message indicating that the larger packet would have been fragmented.
So that has verified we have a link MTU of 1492: 1464 ICMP payload + 8 ICMP header + 20 IPv4 header
Verifying Link MTU with IPv6
This is a little harder on BSD-based hosts because there isn’t a direct equivalent to -D
to inhibit fragmentation, and large pings always appear to work.
I found it easiest to run tcpdump
on the link in question, and just look to see if there are any fragments generated.
So on one terminal do this on the host with the link:
# tcpdump -i pppoe0 ip6
tcpdump: listening on pppoe0, link-type PPP_ETHER
Then ping with the maximum calculated ping6 payload size. Here is an example on a link which supports larger frames:
# ping6 -c1 -m -s 1452 www.freebsd.org
PING6(1500=40+8+1452 bytes) 2001:db8::1 --> 2610:1c1:1:606c::50:15
1460 bytes from 2610:1c1:1:606c::50:15, icmp_seq=0 hlim=48 time=77.775 ms
--- wfe0.nyi.freebsd.org ping6 statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 77.775/77.775/77.775/0.000 ms
tcpdump
output looks something like this:
07:08:19.294105 2001:db8::1 > wfe0.nyi.freebsd.org: icmp6: echo request [flowlabel 0x137e0]
07:08:19.370456 wfe0.nyi.freebsd.org > 2001:db8::1: icmp6: echo reply
There is no fragmentation.
Now try with a payload one byte bigger:
# ping6 -c1 -m -s 1453 www.freebsd.org
PING6(1501=40+8+1453 bytes) 2001:db8::1 --> 2610:1c1:1:606c::50:15
1461 bytes from 2610:1c1:1:606c::50:15, icmp_seq=0 hlim=46 time=77.966 ms
--- wfe0.nyi.freebsd.org ping6 statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 77.966/77.966/77.966/0.000 ms
It appears to succeed, but tcpdump
shows the hidden fragmentation:
07:08:50.244840 2001:db8::1 > wfe0.nyi.freebsd.org: frag (0x920ff335:1448@0+) icmp6: echo request [flowlabel 0x71688]
07:08:50.244848 2001:db8::1 > wfe0.nyi.freebsd.org: frag (0x920ff335:13@1448) [flowlabel 0x71688]
07:08:50.321394 wfe0.nyi.freebsd.org > 2001:db8::1: frag (0x97f09506:1448@0+) icmp6: echo reply
07:08:50.321433 wfe0.nyi.freebsd.org > 2001:db8::1: frag (0x97f09506:13@1448)
So we have verified that the MTU over this link is 1500: 1452 ICMPv6 payload + 8 ICMPv6 header + 40 IPv6 header
Incidentally, the -m
is important.
Without -m
, ping6 tells the kernel to pretend the MTU is 1280 (the minimum IPv6 MTU).
So we get “unnecessary” fragmentation using larger payload sizes.
We need to disable that behaviour for the testing we are doing here.
The TCP MSS
The MSS (Maximum Segment Size) is the maximum payload size for TCP segments.
It is sometimes necessary to know about this when Path MTU discovery breaks.
For example, there is a well-known test site called “AOL” which provides completely broken Path MTU discovery as a service to the internet community.
If you go to aol.com
and your browser hangs, you instantly know you should look closer at your MTU and MSS.
If you are using the pf
packet filter, you can set the MSS when you scrub packets:
match on pppoe0 scrub (max-mss 1440)