Poor UI Breaks Portions of the InternetBy: Nick Chapman
On February 16th, 2009 there was a Border Gateway Protocol (BGP) anomaly which caused connectivity issues for some portions of the Internet. Arbor Networks and Renesys both provided good write-ups of the event. The basic problem was that SuproNET, a local Czech ISP, announced a BGP route with an extremely long Autonomous System (AS) path. The AS path is a BGP attribute that acts like a trail of bread crumbs showing how a route announcement got from its originator to you. As the route announcement passes through various networks they add their AS number to the AS path so that when that AS propagates the route, packets can find their way back home. We generally want packets to take the shortest path, so the length of the AS path is a fairly important metric used for route selection. As the Internet is highly interconnected, most AS paths are short ? usually under a dozen hops.
When network engineers optimize routing, as they are wont to do, they often want a certain link to be preferred over others. A common way to do this is known as AS prepending. This simply means adding your own AS number multiple times to paths you want used less frequently. Other networks will see a longer path and be less likely to use that route.
Unfortunately, sometimes things go wrong. What happened a few days ago was that a route was announced with the originating AS prepended over a hundred times. Some routers were not able to handle this unusual condition and were unable to process routing updates or even crashed. There was even a new bug discovered in Cisco IOS because of this problem. The end result was that some networks were not able to route to some subset (potentially all) of the other networks comprising the Internet.
On February 20th someone from SuproNET posted to the North American Network Operators? Group (NANOG) mailing list, explaining what had happened. While attempting to modify the number of prepends to a route, an engineer entered their AS number thinking that it would be treated as a string and added to the current AS path. Instead, it was treated as an integer, controlling how many times the originating AS number was prepended. This is because MikroTik routers have a syntax very similar to that of the same command on a Cisco router, but differing in how that single argument is treated. The router truncated the number (ignored high bits) to trim it down to an only slightly unreasonable size (176 instead of 20912) and then propagated the Internet breaking route.
BGPmon has a page up showing recent routes with long AS path. The three largest AS paths are a result of the MikroTik bug. As with many other similar problems, there are multiple failures here. An incorrect configuration was entered, due to a poor user interface. Then there were several routers that did not sanitize inputs correctly resulting in crashes and other problems when they received unusual inputs. Thankfully, the network operators of the world are a wily and resilient bunch, capable of quickly organizing and resolving problems of this sort. Without their quick reaction, this (and many other problems you've probably never heard of) would cause a lot more downtime for the Internet at large.