Dijkstra’s algorithm won’t be replaced in production routers any time soon

Systems Approach Last year a couple of people forwarded to me the same article on a new method of finding shortest paths in networks. The underlying research claims to improve on the classic approach pioneered by Dijkstra that is taught in most networking textbooks (including ours). I was initially a bit skeptical, much as I would be if I read that the Riemann Hypothesis had been proved. Dijkstra is a legend in computer science and his algorithm, which he published in 1959, predates packet switching by a few years. The specification for OSPF (Open Shortest Path First (OSPF), one of two dominant link-state routing protocols (Intermediate System-to-Intermediate System, aka IS-IS, is the other). Guidance to implementors of OSPF is unusually detailed and basically tells them to use Dijkstra’s algorithm. And that is what most implementations have done for decades, with a few minor improvements through the years to speed up the implementation but no real change to the fundamentals. The new algorithm is no minor tweak to Dijkstra’s but a significantly different approach. Its headline claim is that, whereas Dijkstra requires a sorting operation, and thus is only able to perform as well as the best sorting algorithm, this new approach “breaks the sorting barrier”. That is, it avoids the need for sorting and is able to deliver better bounds on performance than Dijkstra. While I don’t consider myself qualified to evaluate the paper that describes the new algorithm, it has passed peer-review at a top-tier conference (ACM Symposium on the Theory of Computing) and received enough scrutiny that I don’t doubt that the theory works. The question I want to discuss here is: does it matter? Two main issues came immediately to mind when I tried to assess the implications of a theoretical improvement in algorithmic performance. The first is, what are the actual scaling limits that we need to address in a real routing system. For example, the running time of Dijkstra’s algorithm is on the order of (n log n + m) for a network of n vertices (routers) and m edges (links). The new approach claims order (m log2/3 n) which is clearly going to be less for large enough n. (I had to take a refresher course on log notation before I could even write that sentence with any confidence.) The problem with assessing the scaling properties of anything is you have to wonder how big n must get before it makes a difference. There are constant factors that might differ between the two approaches; at small n, a “less scalable” approach might actually display better performance. What scaling limit? One of my first jobs was part of the team building a scalable packet switch based on arrays of small switching elements. (This is where I got to build the accidental SmartNIC). We had papers to show that we could build switches with thousands of ports at 155 Mbps, in an era when shared Ethernet ran at 10 Mbps and had not yet been overtaken by switched Ethernet. While we at Bellcore were investing lots of time and money to put together a set of 32-port prototype switches, FORE systems delivered commercially successful 16-port switches. Almost certainly they were not as scalable as our design, but it turned out that n=16 was a pretty useful capacity for switches with 155 Mbps links in the 1990s. We felt quite sad that our research seemed to have been overtaken by commercial products. My takeaway was that scalability was a good thing to strive for but that sometimes you might achieve a good result with a less scalable solution. One of my favorite text books, “Principles of Computer Systems Design”, uses the example of stopping a supertanker to demonstrate the scaling limits of a system. The fact that supertankers have a scaling limit doesn’t prevent them from being the backbone of oil shipping. You just need to avoid making them too big. What’s a large value for n in SPF calculations? I checked in with a couple of colleagues to update my recollection of how many routers you might find in a big OSPF or IS-IS backbone. In my memory it was in the hundreds; for the largest service provider networks today it seems to be in the small number of thousands. So that’s not tiny but it’s small compared to, say, the number of prefixes carried in BGP. And it doesn’t seem to be limited by the performance of the SPF calculation, as I will explain below. The many facets of performance Another memory I have from my time working for Big Router was an analysis of all the things that go into the performance of routing protocols. I was working on MPLS in its early days, and we were pretty excited about a technology called “fast reroute” (FRR) which uses MPLS to divert packets around a failed link without waiting for routing to converge after the failure. But at the same time, the people responsible for routing protocol development were hard at work improving routing convergence times. One of the most important things, it turns out, for both MPLS and standard routing, was simply detecting failure faster. For example, if you have to wait for tens of seconds of missing OSPF Hello packets before you declare a link down, it doesn’t really matter if you can calculate the shortest path in a fraction of a second. This was the reasoning behind the creation of BFD (Bidirectional Forwarding Detection): a fast mechanism, independent of routing, by which link failures could be detected for any type of link. Beyond fast failure detection, other factors playing into routing convergence include: the time to send a new link state packet out a link and propagation delay across the network; time to receive such a packet and dispatch it to the right process in the operating system; SPF time; time to update the routing information base; time to calculate the impact on the forwarding table; time to push any forwarding table updates out to the line cards (on big routers that have such things); time to flood the link state packet out to neighbors. All of these steps have been analyzed and optimized over the years, to the point where sub-second routing convergence is now routine. As early as 2003 the improvements to all the steps above had enabled sub-second convergence, as this NANOG presentation shows. Yes, you couldn’t afford to spend 10 seconds doing an SPF calculation if you wanted fast convergence, but that was already a solved issue by then. Optimizing all the other parts was at least as important as improving the SPF calculation time. Finally, I chatted with another colleague about this topic and he reminded me of a reason that Dijkstra’s algorithm remains the implementation method of choice: it can be understood by the people who have to write code. Dijkstra himself puts it well in a 2001 interview: The publication is still readable, it is, in fact, quite nice. One of the reasons that it is so nice was that I designed it without pencil and paper. I learned later that one of the advantages of designing without pencil and paper is that you are almost forced to avoid all avoidable complexities. In other words, Keep It Simple, Stupid. I would certainly rather point an engineer at the OSPF spec than send them off to understand the hybrid Bellman-Ford/Dijkstra approach that might shave a few milliseconds off the non-bottleneck part of routing convergence. Maybe one day someone will write the explanation of the novel SPF approach that is as clear as Dijkstra’s paper and the OSPF spec. And the hybrid algorithm might be great for large mapping applications. But I don’t see Dijkstra’s algorithm being replaced in production routers any time soon. ® Larry Peterson and Bruce Davie are the authors behind Computer Networks: A Systems Approach and the related Systems Approach series of books. All their content is open source and available for free on GitHub. You can find them on Mastodon, their newsletter right here, and past The Register columns here.

View Original Article

0 0 Share

0 people liked this

More from this channel

Dijkstra’s algorithm won’t be replaced in production routers any time soon