Martin KaFai LauFacebook has a huge IPv6 deployment, so it is no surprise that we've pushed the Linux IPv6 stack to the limit. One of our pain points in deploying IPv6 is the size of the routing tree. We've solved this scalability issue by creating routing cache on demand.
In Linux, the size of the IPv6 routing tree grows as the number of peers that a machine is talking to grows. For example, if a machine has a '/64' gateway route and it is talking to a million peers, a million '/128' routing caches will be created and inserted to the tree. (It's worth noting that whenever a packet is received or sent out, it has to look up the routing tree to decide the next hop. There are exceptions, but those don't affect our discussion here.)
A big routing tree has the following problems:
To solve these problems, we have implemented on-demand routing cache creation and contributed it to the upstream kernel. The patch series is here.
Before going into the details, here are some numbers:
The following graph shows the number of routing entries in the tree. It is in log scale. The bottom orange line has the patched kernel. It has a lot fewer entries. The result is that the GC cleans up 500 instead of 200K routing caches in each run, and it is a multimagnitude improvement.

The benchmark test we used during development is called udpflood; you can see it in the chart below. The test continuously sends out UDP packets, which require a routing tree lookup. A dummy device is used as the outgoing interface. After removing the routing cache and adding a per-CPU entry optimization, we don't see performance loss, and there is ~6 percent gain in the 40-process test:

Why is a '/128' routing cache needed?
The routing cache per peer is there to prepare for potential PMTU (Path MTU) exception. In IPv6's case, it is the ICMPv6 too-big message. When an ICMPv6 too-big message comes in, it has to update the PMTU value for that particular '/128' routing cache.
In one of our edge machines running Proxygen, the tree has 300K IPv6 routing caches, but only 1K of them have a different MTU. Hence, almost all of them are created for nothing.
The fix seems obvious: Create a routing cache on demand. For peers with the default PMTU, share the gateway route. There are some interesting challenges, however. To name a few:
Other fixes:
Thanks to Hannes Frederic Sowa, David Miller, Steffen Klassert, and Julian Anastasov for the roles they played in shipping these solutions.
Stay up-to-date via RSS with the latest open source project releases from Facebook, news from our Engineering teams, and upcoming events.
Subscribe