updated 06:56 pm EST, Mon December 31, 2012
Developer accidentally deleted data, causing outage
The Amazon Web Services outage that led to Netflix's downtime on Christmas Eve was caused by an accidental data deletion. This according to a summary posted Amazon on the AWS page. The summary states that one of a very small number of developers with access to AWS' Elastic Load Balancing Service deleted a portion of the ELB state data, causing the ELB control plane to experience high latency and error rates for API calls to manage ELB load balancers.
Amazon says the downtime persisted as long as it did because its technical teams were initially focused on API errors. The team had not noticed that ELB data had been deleted. It was only when the technical team working on the ELB dug into the system's load balancers that the missing data was discovered.
Amazon's statement says that the team has put in place a number of protections aimed at preventing this sort of outage in the future. The team has modified their system in order to prevent accidental modification without specific Change Management approval. They have also modified the data recovery process to take into account what the team learned in this most recent outage.