updated 08:42 pm EDT, Wed August 13, 2014
BGP tables hitting 512K limit, could mean more outages as capped machines surface
Since yesterday, the Internet has hit some bouts of turbulence as a widespread issue slowly works its way across infrastructure. The issues aren't tied to any sort of fiber issues or due to the exhaustion of iPv4 address, but rather the exhaustion of memory inside of some hardware such as routers and switches. Starting on Tuesday, border gateway protocol (BGP) tables, the tables responsible for keeping track of a map of the Internet, started to hit their limit in older hardware, making chunks of the Internet inaccessible in the process.
The problem stems from specialized Tertiary Content Addressable Memory (TCAM), specifically TCAM resides inside older hardware, which is limited to only 512K in capacity. BGP tables are stored in this memory to help devices determine the best routes to direct traffic. In the past, routers and other hardware were only given enough space for 512K (524,288 or two to the power of 19) worth of entries, thinking that it would be enough to accommodate for the future.
However, much like the IPv4 issue, Internet growth is surpassing past predictions. With the rampant growth of the Internet, the possible connections outlined in BGP table are growing so large that numerous infrastructure pieces are hitting the 512K limit. Companies like Cisco announced in May that the BGP problem was coming, to which the company posted a list of a number of switches and routers that would be affected. However, workarounds were possible, though in some cases can come at the expense of IPv6 support.
As machines reach the cap, some are witnessing crashes or are just ignoring additional routes. This leads to unstable or missing routes in the process, causing some swaths of the Internet to be unreachable, including large sites like eBay and Facebook or providers like Comcast and Time Warner Cable. Even Level 3 Communications, a company responsible for a large chunk of data delivery in the United States, was hit with outages. Password management service LastPass even witnessed a data center failure.
The memory issue isn't big enough to cause much concern for the time being, as machines are only now starting to hit these limits according to Dyn Chief Scientist Jim Cowie. That does mean that more problems could become worse, and soon -- as more and more machines reach their cap. It isn't all doom and gloom, as most modern routers and switches should have more capacity than their older counterparts. Even then, most devices can be configured by system administrators to avoid issues. The problem will come from older hardware that hasn't been setup properly, or sits at key junctures. Cowie indicates that there is no evidence of greater Internet instability at this time.
"This event won't be over tomorrow; in fact, it has barely begun," said Cowie. "As the routing table size distribution creeps to the right, the number of routers in the world who 'see' 512K+ routes will steadily increase. Within a few weeks, nearly every piece of vulnerable gear will have been discovered, as 512K+ becomes the global consensus opinion. We don't know how many machines that represents, and we don't know what the net impact will be on local Internet connectivity before it all gets sorted out."